netdev - Re: tcp: rearm RTO timer does not comply with RFC6298

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQy=jwBHg_Pf+puzxTCOCKxZJU2uThAuXU9CtkWFxtqU69w@mail.gmail.com>
Date:   Thu, 21 Jan 2021 10:50:52 -0500
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Pengcheng Yang <yangpc@...gsu.com>
Cc:     Yuchung Cheng <ycheng@...gle.com>, Netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: tcp: rearm RTO timer does not comply with RFC6298

On Thu, Jan 21, 2021 at 9:05 AM Pengcheng Yang <yangpc@...gsu.com> wrote:
>
> On Thu, Jan 21, 2021 at 2:59 AM Yuchung Cheng <ycheng@...gle.com> wrote:
> >
> > On Wed, Jan 20, 2021 at 6:59 AM Neal Cardwell <ncardwell@...gle.com> wrote:
> > >
> > > On Wed, Jan 20, 2021 at 5:50 AM Pengcheng Yang <yangpc@...gsu.com> wrote:
> > > >
> > > > hi,
> > > >
> > > > I have a doubt about tcp_rearm_rto().
> > > >
> > > > Early TCP always rearm the RTO timer to NOW+RTO when it receives
> > > > an ACK that acknowledges new data.
> > > >
> > > > Referring to RFC6298 SECTION 5.3: "When an ACK is received that
> > > > acknowledges new data, restart the retransmission timer so that
> > > > it will expire after RTO seconds (for the current value of RTO)."
> > > >
> > > > After ER and TLP, we rearm the RTO timer to *tstamp_of_head+RTO*
> > > > when switching from ER/TLP/RACK to original RTO in tcp_rearm_rto(),
> > > > in this case the RTO timer is triggered earlier than described in
> > > > RFC6298, otherwise the same.
> > > >
> > > > Is this planned? Or can we always rearm the RTO timer to
> > > > tstamp_of_head+RTO?
> > > >
> > > > Thanks.
> > > >
> > >
> > > This is a good question. As far as I can tell, this difference in
> > > behavior would only come into play in a few corner cases, like:
> > >
> > > (1) The TLP timer fires and the connection is unable to transmit a TLP
> > > probe packet. This could happen due to memory allocation failure  or
> > > the local qdisc being full.
> > >
> > > (2) The RACK reorder timer fires but the connection does not take the
> > > normal course of action and mark some packets lost and retransmit at
> > > least one of them. I'm not sure how this would happen. Maybe someone
> > > can think of a case.
>
> Yes, and it also happens when an ACK (a cumulative ACK covered out-of-order data)
> is received that makes ca_state change from DISORDER to OPEN, by calling tcp_set_xmit_timer().
> Because TLP is not triggered under DISORDER and tcp_rearm_rto() is called before the
> ca_state changes.

Hmm, that sounds like a good catch, and potentially a significant bug.
Re-reading the code, it seems that you correctly identify that on an
ACK when reordering is resolved (ca_state change from DISORDER to
OPEN) we will not set a TLP timer for now+TLP_interval, but instead
will set an RTO timer for rtx_head_tx_time+RTO (which could be very
soon indeed, if RTTVAR is very low). Seems like that could cause
spurious RTOs with connections that experience reordering with low RTT
variance.

It seems like we should try to fix this. Perhaps by calling
tcp_set_xmit_timer() only after we have settled on a final ca_state
implied by this ACK (in this case, to allow DISORDER to be resolved to
OPEN). Though that would require some careful surgery, since that
would move the tcp_set_xmit_timer() call *after* the point at which
the RACK reorder timer would be set.

Other thoughts?

neal