[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1803091548010.29120@whs-18.cs.helsinki.fi>
Date: Fri, 9 Mar 2018 16:11:47 +0200 (EET)
From: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
To: Yuchung Cheng <ycheng@...gle.com>
cc: Neal Cardwell <ncardwell@...gle.com>,
Netdev <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net 4/5] tcp: prevent bogus undos when SACK is not
enabled
On Wed, 7 Mar 2018, Yuchung Cheng wrote:
> On Wed, Mar 7, 2018 at 12:19 PM, Neal Cardwell <ncardwell@...gle.com> wrote:
> >
> > On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
> > > A bogus undo may/will trigger when the loss recovery state is
> > > kept until snd_una is above high_seq. If tcp_any_retrans_done
> > > is zero, retrans_stamp is cleared in this transient state. On
> > > the next ACK, tcp_try_undo_recovery again executes and
> > > tcp_may_undo will always return true because tcp_packet_delayed
> > > has this condition:
> > > return !tp->retrans_stamp || ...
> > >
> > > Check for the false fast retransmit transient condition in
> > > tcp_packet_delayed to avoid bogus undos. Since snd_una may have
> > > advanced on this ACK but CA state still remains unchanged,
> > > prior_snd_una needs to be passed instead of tp->snd_una.
> >
> > This one also seems like a case where it would be nice to have a
> > specific packet-by-packet example, or trace, or packetdrill scenario.
> > Something that we might be able to translate into a test, or at least
> > to document the issue more explicitly.
>
> I am hesitate for further logic to make undo "perfect" on non-sack
> cases b/c undo is very complicated and SACK is extremely
> well-supported today. so a trace to demonstrate how severe this issue
> is appreciated.
This is not just some remote corner cases to which I perhaps would
understand your "making undo perfect" comment. Those undos result in
a burst that, at worst, triggers additional buffer overflow because the
correct CC action is cancelled. Unfortunately I don't have now permission
to publish the time-seq graph about it but I've tried to improve the
changelog messages so that you can better understand under which
conditions the problem occurs.
SACK case remains the same even after this change. I did rework the
logic a bit though (pass prior_snd_una rather than flag around) to
make it more obvious but the change is unfortunately lengthy (no matter
what I pass through the call-chain).
--
i.
Powered by blists - more mailing lists