netdev - Re: TCP stack bug related to F-RTO?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <773030.8168.qm@web63404.mail.re1.yahoo.com>
Date:	Fri, 25 Sep 2009 08:58:15 -0700 (PDT)
From:	Joe Cao <caoco2002@...oo.com>
To:	Ray Lee <ray-lk@...rabbit.org>,
	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc:	Netdev <netdev@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>, caoco2002@...oo.com
Subject: Re: TCP stack bug related to F-RTO?


Hi Ilpo,

Thanks for the reply!  Do you happen to know which patch fixed the problem? Is there a bug tracking system for linux kernel?

I studied the FRTO code in latest kernel 2.6.31.  It seems the problem is still there:  

1. Every time a RTO fires, because tcp_is_sackfrto(tp) returns 1, tcp_use_frto() returns true.  And the server tcp enters FRTO.
2. After the head of write queue is retransmitted, two new data packets are transmitted, the server receives two dup-ACKs.  That will make the TCP enter tcp_enter_frto_loss(), however, that only rests ssthresh and some other fields.
3. After another longer RTO fires, because tcp_is_sackfrto(tp) returns 1, tcp_use_frto() again returns true.  The stack enters FRTO again.
4. The above repeats and the stack couldn't retransmits the lost packets faster.

Is my understanding above correct?

Thanks,
Joe 

--- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:

> From: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Ray Lee" <ray-lk@...rabbit.org>
> Cc: "Joe Cao" <caoco2002@...oo.com>, "Netdev" <netdev@...r.kernel.org>, "LKML" <linux-kernel@...r.kernel.org>, jcaoco2002@...oo.com
> Date: Friday, September 25, 2009, 6:09 AM
> On Thu, 24 Sep 2009, Ray Lee wrote:
> 
> > [adding netdev cc:]
> > 
> > On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao <caoco2002@...oo.com>
> wrote:
> > >
> > > Hello,
> > >
> > > I have found the following behavior with
> different versions of linux 
> > > kernel. The attached pcap trace is collected with
> server 
> > > (192.168.0.13) running 2.6.24 and shows the
> problem. Basically the 
> > > behavior is like this: 
> > >
> > > 1. The client opens up a big window,
> > > 2. the server sends 19 packets in a row (pkt #14-
> #32 in the trace), but all of them are dropped due to some
> congestion.
> > > 3. The server hits RTO and retransmits pkt #14 in
> #33
> > > 4. The client immediately acks #33 (=#14), and
> the server (seems like to enter F-RTO) expends the window
> and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to
> 2*RTO; The client immediately sends two Dup-ack to #35 and
> #36.
> > > 5. after 2*RTO, pkt #15 is retransmitted in #39.
> > > 6. The client immediately acks #39 (=#15) in #40,
> and the server continues to expand the window and sends two
> *NEW* pkt #41 & #42. Now the timeoute is doubled to 4
> *RTO.
> > > 8. After 4*RTO timeout, #16 is retransmitted.
> > > 9....
> > > 10. The above steps repeats for retransmitting
> pkt #16-#32 and each time the timeout is doubled.
> > > 11. It takes a long long time to retransmit all
> the lost packets and before that is done, the client sends a
> RST because of timeout.
> > >
> > > The above behavior looks like F-RTO is in effect.
>  And there seems to 
> > > be a bug in the TCP's congestion control and
> retransmission algorithm. 
> > > Why doesn't the TCP on server (running 2.6.24)
> enter the slow start? 
> > > Why should the server take that long to recover
> from a short period 
> > > of packet loss?
> > >
> > > Has anyone else noticed similar problem before?
>  If my analysis was 
> > > wrong, can anyone gives me some pointers to
> what's really wrong and 
> > > how to fix it?
> 
> Yes, 2.6.24 is an obsoleted version with known wrongs in
> FRTO 
> implementation. Fixes never when to 2.6.24 stable series as
> it was 
> _already_ obsoleted when the problems where reported and
> found. The 
> correct fixes may be found from 2.6.25.7 (.7 iirc) and are
> included from 
> 2.6.26 onward too.
> 
> Just in case you happen to run ubuntu based kernel from
> that era (of 
> course you should be reporting the bug here then...), a
> word of warning: 
> it seemed nearly impossible for them to get a simple thing
> like that 
> fixed, I haven't been looking if they'd eventually come to
> some sensible 
> conclusion in that matter or is it still unresolved (or
> e.g., closed 
> without real resolution).
> 
> -- 
>  i.


      

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html