netdev - Re: [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are disabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 15 Jun 2018 13:35:45 +0300 (EEST)
From:   Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
To:     Michal Kubecek <mkubecek@...e.cz>
cc:     Yuchung Cheng <ycheng@...gle.com>, netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are
 disabled

On Fri, 15 Jun 2018, Michal Kubecek wrote:

> On Fri, Jun 15, 2018 at 11:05:03AM +0300, Ilpo Järvinen wrote:
> > On Thu, 14 Jun 2018, Michal Kubecek wrote:
> > 
> > My point was that the new data segment bursts that occur if the sender 
> > isn't application limited indicate that there's something going wrong
> > with FRTO. And that wrong is also what is causing that RTO loop because
> > the sender doesn't see the previous FRTO recovery on second RTO. With 
> > my FRTO undo fix, (new_recovery || icsk->icsk_retransmits) will be false
> > and that will prevent the RTO loop.
> 
> Yes, it would prevent the loop in this case (except it would be a bit
> later, after second RTO rather than after first).

Hmm, I'm actually wrong about the new data missing bit I think. After
reading more code I'm quite sure conventional RTO recovery is triggered 
right away (as long as that bogus undo that ends the recovery wouldn't
occur first like it does without my fix). So no second RTO would be 
needed.

> But I'm not convinced
> the logic of the patch is correct. If I understand it correctly, it
> essentially changes "presumption of innocence" (if we get an ack past
> what we retransmitted, we assume it was received earlier - i.e. would
> have been sacked before if SACK was in use) to "presumption of guilt"
> (whenever a retransmitted segment is acked, we assume nothing else acked
> with it was received earlier). Or that you trade false negatives for
> false positives.

FRTO depends on knowing for sure what packet (original pre-RTO one or 
something that was transmitted post-RTO) triggered the ACK. If FRTO
isn't sure that the ACK was generated by a post-RTO packet, it must
not assume innocence! This change in practice affects just the time while 
the segment rexmitted by RTO is still there, that is, processing in step 2 
(if we get a cumulative ACK beyond it because the next loss is not for the 
subsequent segment but for some later segment, FLAG_ORIG_SACK_ACKED is set 
and we'll incorrectly do step 3b while still in FRTO has only reached step 
2 for real; this is fixed by my patch). ...The decision about 
positive/negative only occurs _after_ that in step 3.

> Maybe I understand it wrong but it seems that you de facto prevent
> Step (3b) from ever happening in non-SACK case.

Only if any of skb that was ACKed had been retransmitted. There shouldn't 
be any in step 3 because the RTO rexmit was ACKed (and also because 
of how new_recovery variable works wrt. earlier recoveries). Thus, in
step 3 the fix won't clear FLAG_ORIG_SACK_ACKED flag allowing FRTO to 
detect spurious RTOs using step 3b.

> > > > No! The window should not update window on ACKs the receiver intends to 
> > > > designate as "duplicate ACKs". That is not without some potential cost 
> > > > though as it requires delaying window updates up to the next cumulative 
> > > > ACK. In the non-SACK series one of the changes is fixing this for
> > > > non-SACK Linux TCP flows.
> > > 
> > > That sounds like a reasonable change (at least at the first glance,
> > > I didn't think about it too deeply) but even if we fix Linux stack to
> > > behave like this, we cannot force everyone else to do the same.
> > 
> > Unfortunately I don't know what the other stacks besides Linux do. But 
> > for Linux, the cause for the changing receiver window is the receiver 
> > window auto-tuning and I'm not sure if other stacks have a similar 
> > feature (or if that affects (almost) all ACKs like in Linux).
> 
> The capture from my previous e-mail and some others I have seen indicate
> that at least some implementations do not take care to never change
> window size when responding to an out-of-order segment.  That means that
> even if we change linux TCP this way (we might still need to send
> a separate window update in some cases), we still cannot rely on others
> doing the same.

Those implementations ignore what is a duplicate ACK (RFC5681, which
is also pointed into by RFC5682 for its defination):
  DUPLICATE ACKNOWLEDGMENT: An acknowledgment is considered a
      "duplicate" ... (e)
      the advertised window in the incoming acknowledgment equals the
      advertised window in the last incoming acknowledgment.

Not sending duplicate ACKs also means that fast recovery will not work
for those flows but that may not show up performance wise as long as you 
have enough capacity for any unnecessary rexmits the forced RTO recovery 
is going to do. RTO recovery may actually improve completion times for 
non-SACK flows as NewReno recovers only one lost pkt/RTT where as RTO 
recovery needs log(outstanding packets) RTTs at worst. For a large number 
of losses in a window, the log is going to win.

> I checked the capture attached to my previous e-mail again and there is
> one thing where our F-RTO implementation (in 4.4, at least) is wrong,
> IMHO. While the first ACK after "new data" (sent in (2b)) was a window
> update (and therefore not dupack by definition) so that we could take
> neither (3a) nor (3b), in some iterations there were further acks which
> did not change window size. The text just before Step 1 says
> 
>   The F-RTO algorithm does not specify actions for receiving
>   a segment that neither acknowledges new data nor is a duplicate
>   acknowledgment.  The TCP sender SHOULD ignore such segments and
>   wait for a segment that either acknowledges new data or is
>   a duplicate acknowledgment.
> 
> My understanding is that this means that while the first ack after new
> data is correctly ignored, the following ack which preserves window size
> should be recognized as a dupack and (3a) should be taken.

Linux FRTO never gets that far (without my fix) if the ACK in step 2
covers beyond the RTO rexmit because 3b is prematurely invoked, that's 
why you never see what would occur if 3a is taken. TCP thinks it's not 
recovering anymore and therefore can send only new data (if there's some 
available).

This is what I tried to tell earlier, with new data there you see there's 
something else wrong too with FRTO besides the RTO loop.


-- 
 i.