[patch] quic PTO counter fixes

Tue Nov 14 13:10:00 UTC 2023

> On 9 Nov 2023, at 19:14, Vladimir Homutov <vl at inspert.ru> wrote:
> 
>> On Thu, Oct 26, 2023 at 03:08:55AM +0400, Sergey Kandaurov wrote:
>>> # HG changeset patch
>>> # User Vladimir Khomutov <vl at wbsrv.ru>
>>> # Date 1697031803 -10800
>>> #      Wed Oct 11 16:43:23 2023 +0300
>>> # Node ID 9ba2840e88f62343b3bd794e43900781dab43686
>>> # Parent  1f188102fbd944df797e8710f70cccee76164add
>>> QUIC: fixed handling of PTO counter.
>>> 
>>> The RFC 9002 clearly says in "6.2. Probe Timeout":
>>>    ...
>>>    As with loss detection, the PTO is per packet number space.
>>>    That is, a PTO value is computed per packet number space.
>>> 
>>> Despite that, current code is using per-connection PTO counter.
>>> For example, this may lead to situation when packet loss at handshake
>>> level will affect PTO calculation for initial packets, preventing
>>> send of new probes.
>> 
>> Although PTO value is per packet number space, PTO backoff is not,
>> see "6.2.1 Computing PTO":
>> 
>> : When ack-eliciting packets in multiple packet number spaces are in flight, the
>> : timer MUST be set to the earlier value of the Initial and Handshake packet
>> : number spaces.
> 
> And I read this fragment as:
>    - there are multiple timer values (i.e. per packet number space)
>      (and value is pto * backoff)
>    - we have to choose the earliest value
> 
> The ngx_quic_pto() has nothing that depends on packet number space
> (with the minor exception that we add max_ack_delay at application level
> after the handshake)
> So pto_count is the only thing that can make timer values to be
> different in different packet number spaces.

While ngx_quic_pto() doesn't depend on space (except max_ack_delay),
it is currently implemented such way that it returns just duration,
and it is the caller's obligation to apply it to the time of the last
ack-eliciting packet (earliest among spaces) to get pto timeout.
This is done in ngx_quic_set_lost_timer() and ngx_quic_pto_handler(),
and seemingly missed for path validation in ngx_quic_set_path_timer()
and for 3*PTO connection tear down in ngx_quic_close_connection().

Probably ngx_quic_pto() should be refactored to include this
"duration => pto_timeout" calculation.  This won't even break its
signature since it's already passed ctx, which seems to be enough
to get the earliest ack-eliciting packet among spaces.

> 
>> 
>> But:
>> 
>> : When a PTO timer expires, the PTO backoff MUST be increased <..>
>> 
>> : This exponential reduction in the sender's rate is important because consecutive
>> : PTOs might be caused by loss of packets or acknowledgments due to severe
>> : congestion.  Even when there are ack-eliciting packets in flight in multiple
>> : packet number spaces, the exponential increase in PTO occurs across all spaces
>> : to prevent excess load on the network.  For example, a timeout in the Initial
>> : packet number space doubles the length of the timeout in the Handshake packet
>> : number space.
> 
> yes, this really looks like contradiction.
> At least I don't understand how it is possible to have PTO value
> different by packet number space given the way we calculate it.
> 
>> Even if that would be proven otherwise, I don't think the description
>> provides detailed explanation.  It describes a pretty specific use case,
>> when both Initial and Handshake packet number spaces have in-flight packets
>> with different PTO timeout (i.e. different f->last).  Typically they are
>> sent coalesced (e.g. CRYPTO frames for ServerHello and (at least)
>> EncryptedExtensions TLS messages).
>> In interop tests, though, it might be different: such packets may be
>> sent separately, with Handshake packet thus having a later PTO timeout.
>> If such, PTO timer will first fire for the Initial packet, then for Handshake,
>> which will result in PTO backoff accumulated for each packet:
>> 
>> t1: <- Initial (lost)
>> t2: <- Handshake (lost)
>> t1': pto(t1) timeout
>>     <- Initial (pto_count=1)
>> t2': pto(t2) timeout
>>     <- Handshake (pto_count=2)
>> t1'': pto(t1') timeout
>>     <- Initial (pto_count=3)
>> 
>> So, I would supplement the description with the phrase that that's
>> fair typically with uncoalesced packets seen in interop tests, and
>> that the same is true vice verse with packet loss at initial packet
>> number space affecting PTO backoff in handshake packet number space.
>> 
>> But see above about PTO backoff increase across all spaces.
> 
> I tend to think that it is better to leave things as is.
> maybe RFC needs some better wording in this case.
> 
> I've checked ngtcp2 and msquic and it it looks like both
> handle pto counter per-connection too;
> (see pto_count in ngtcp2 and QUIC_LOSS_DETECTION.ProbeCount in msquic)
> 

Ok, thanks for deep looking into this.

>>> 
>> 
>> This part of the change to reset pto_count
>> for duplicate ACK or ACK for non-ack-eliciting frame
>> contradicts the OnAckReceived example in RFC 9002,
>> although I didn't found a format text in the RFC itself:
>> 
>> OnAckReceived(ack, pn_space):
>>  ...
>>  // DetectAndRemoveAckedPackets finds packets that are newly
>>  // acknowledged and removes them from sent_packets.
>>  newly_acked_packets =
>>      DetectAndRemoveAckedPackets(ack, pn_space)
>>  // Nothing to do if there are no newly acked packets.
>>  if (newly_acked_packets.empty()):
>>    return
>> 
>>  // Update the RTT if the largest acknowledged is newly acked
>>  // and at least one ack-eliciting was newly acked.
>>  ...
>> 
>>  // Reset pto_count ...
>> 
>> From which it follows that pto_count is reset
>> (and RTT updated) for newly ack'ed packets only.
> 
> yes, agreed.
> the main issue is tracking of in-flight PING frames.
>> 
>> I think the better fix would be to properly track in-flight PING frames.
>> Moreover, the current behaviour of not tracking PING frames in ctx->sent
>> prevents from a properly calculated PTO timeout: each time it is calculated
>> against the original packet (with increasingly receding time to the past)
>> that triggered the first PTO timeout, which doesn't result in exponentially
>> increased PTO period as expected, but rather some bogus value.
> 
> Agree. I've created a patch that fixes this.
> 

(Will reply separately.)

-- 
Sergey Kandaurov