[patch] quic PTO counter fixes
Sergey Kandaurov
pluknet at nginx.com
Tue Nov 14 13:10:00 UTC 2023
> On 9 Nov 2023, at 19:14, Vladimir Homutov <vl at inspert.ru> wrote:
>
>> On Thu, Oct 26, 2023 at 03:08:55AM +0400, Sergey Kandaurov wrote:
>>> # HG changeset patch
>>> # User Vladimir Khomutov <vl at wbsrv.ru>
>>> # Date 1697031803 -10800
>>> # Wed Oct 11 16:43:23 2023 +0300
>>> # Node ID 9ba2840e88f62343b3bd794e43900781dab43686
>>> # Parent 1f188102fbd944df797e8710f70cccee76164add
>>> QUIC: fixed handling of PTO counter.
>>>
>>> The RFC 9002 clearly says in "6.2. Probe Timeout":
>>> ...
>>> As with loss detection, the PTO is per packet number space.
>>> That is, a PTO value is computed per packet number space.
>>>
>>> Despite that, current code is using per-connection PTO counter.
>>> For example, this may lead to situation when packet loss at handshake
>>> level will affect PTO calculation for initial packets, preventing
>>> send of new probes.
>>
>> Although PTO value is per packet number space, PTO backoff is not,
>> see "6.2.1 Computing PTO":
>>
>> : When ack-eliciting packets in multiple packet number spaces are in flight, the
>> : timer MUST be set to the earlier value of the Initial and Handshake packet
>> : number spaces.
>
> And I read this fragment as:
> - there are multiple timer values (i.e. per packet number space)
> (and value is pto * backoff)
> - we have to choose the earliest value
>
> The ngx_quic_pto() has nothing that depends on packet number space
> (with the minor exception that we add max_ack_delay at application level
> after the handshake)
> So pto_count is the only thing that can make timer values to be
> different in different packet number spaces.
While ngx_quic_pto() doesn't depend on space (except max_ack_delay),
it is currently implemented such way that it returns just duration,
and it is the caller's obligation to apply it to the time of the last
ack-eliciting packet (earliest among spaces) to get pto timeout.
This is done in ngx_quic_set_lost_timer() and ngx_quic_pto_handler(),
and seemingly missed for path validation in ngx_quic_set_path_timer()
and for 3*PTO connection tear down in ngx_quic_close_connection().
Probably ngx_quic_pto() should be refactored to include this
"duration => pto_timeout" calculation. This won't even break its
signature since it's already passed ctx, which seems to be enough
to get the earliest ack-eliciting packet among spaces.
>
>>
>> But:
>>
>> : When a PTO timer expires, the PTO backoff MUST be increased <..>
>>
>> : This exponential reduction in the sender's rate is important because consecutive
>> : PTOs might be caused by loss of packets or acknowledgments due to severe
>> : congestion. Even when there are ack-eliciting packets in flight in multiple
>> : packet number spaces, the exponential increase in PTO occurs across all spaces
>> : to prevent excess load on the network. For example, a timeout in the Initial
>> : packet number space doubles the length of the timeout in the Handshake packet
>> : number space.
>
> yes, this really looks like contradiction.
> At least I don't understand how it is possible to have PTO value
> different by packet number space given the way we calculate it.
>
>> Even if that would be proven otherwise, I don't think the description
>> provides detailed explanation. It describes a pretty specific use case,
>> when both Initial and Handshake packet number spaces have in-flight packets
>> with different PTO timeout (i.e. different f->last). Typically they are
>> sent coalesced (e.g. CRYPTO frames for ServerHello and (at least)
>> EncryptedExtensions TLS messages).
>> In interop tests, though, it might be different: such packets may be
>> sent separately, with Handshake packet thus having a later PTO timeout.
>> If such, PTO timer will first fire for the Initial packet, then for Handshake,
>> which will result in PTO backoff accumulated for each packet:
>>
>> t1: <- Initial (lost)
>> t2: <- Handshake (lost)
>> t1': pto(t1) timeout
>> <- Initial (pto_count=1)
>> t2': pto(t2) timeout
>> <- Handshake (pto_count=2)
>> t1'': pto(t1') timeout
>> <- Initial (pto_count=3)
>>
>> So, I would supplement the description with the phrase that that's
>> fair typically with uncoalesced packets seen in interop tests, and
>> that the same is true vice verse with packet loss at initial packet
>> number space affecting PTO backoff in handshake packet number space.
>>
>> But see above about PTO backoff increase across all spaces.
>
> I tend to think that it is better to leave things as is.
> maybe RFC needs some better wording in this case.
>
> I've checked ngtcp2 and msquic and it it looks like both
> handle pto counter per-connection too;
> (see pto_count in ngtcp2 and QUIC_LOSS_DETECTION.ProbeCount in msquic)
>
Ok, thanks for deep looking into this.
>>>
>>
>> This part of the change to reset pto_count
>> for duplicate ACK or ACK for non-ack-eliciting frame
>> contradicts the OnAckReceived example in RFC 9002,
>> although I didn't found a format text in the RFC itself:
>>
>> OnAckReceived(ack, pn_space):
>> ...
>> // DetectAndRemoveAckedPackets finds packets that are newly
>> // acknowledged and removes them from sent_packets.
>> newly_acked_packets =
>> DetectAndRemoveAckedPackets(ack, pn_space)
>> // Nothing to do if there are no newly acked packets.
>> if (newly_acked_packets.empty()):
>> return
>>
>> // Update the RTT if the largest acknowledged is newly acked
>> // and at least one ack-eliciting was newly acked.
>> ...
>>
>> // Reset pto_count ...
>>
>> From which it follows that pto_count is reset
>> (and RTT updated) for newly ack'ed packets only.
>
> yes, agreed.
> the main issue is tracking of in-flight PING frames.
>>
>> I think the better fix would be to properly track in-flight PING frames.
>> Moreover, the current behaviour of not tracking PING frames in ctx->sent
>> prevents from a properly calculated PTO timeout: each time it is calculated
>> against the original packet (with increasingly receding time to the past)
>> that triggered the first PTO timeout, which doesn't result in exponentially
>> increased PTO period as expected, but rather some bogus value.
>
> Agree. I've created a patch that fixes this.
>
(Will reply separately.)
--
Sergey Kandaurov
More information about the nginx-devel
mailing list