netdev - Re: TCP stack bug related to F-RTO?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <619356.98592.qm@web63403.mail.re1.yahoo.com>
Date:	Fri, 25 Sep 2009 09:02:19 -0700 (PDT)
From:	Joe Cao <caoco2002@...oo.com>
To:	zhigang gong <zhigang.gong@...il.com>
Cc:	linux-kernel@...r.kernel.org, jcaoco2002@...oo.com,
	netdev@...r.kernel.org
Subject: Re: TCP stack bug related to F-RTO?

Hi Zhigang,

Thanks for help looking into the issue.

My answer to your analysis is of course there won't the third dup-ack, because the server only sends TWO NEW data packets every time.  Clearly this is server's problem and not the client's problem.

Thanks,
Joe

--- On Fri, 9/25/09, zhigang gong <zhigang.gong@...il.com> wrote:

> From: zhigang gong <zhigang.gong@...il.com>
> Subject: Re: TCP stack bug related to F-RTO?
> To: "Joe Cao" <caoco2002@...oo.com>
> Cc: linux-kernel@...r.kernel.org, jcaoco2002@...oo.com, netdev@...r.kernel.org
> Date: Friday, September 25, 2009, 1:55 AM
> Oh, I see, so I spoke too quickly in
> last mail. You just ignore some packets
> in the trace. I have analysed the traffic flow  and
> have some findings as below,
> hope it's helpful.
> 
> >> > 1. The client opens up a big window,
> >> > 2. the server sends 19 packets in a row (pkt
> #14- #32
> >> in the trace), but all of them are dropped due to
> some
> >> congestion.
> >> > 3. The server hits RTO and retransmits pkt
> #14 in #33
> This retransmission timer expiring indicate the server's
> tcp/ip
> stack to enter slow start mode, as a result we can see the
> server's sending window will be reduced to one.
> 
> >> > 4. The client immediately acks #33 (=#14),
> and the
> >> server (seems like to enter F-RTO) expends the
> window and
> >> sends *NEW* pkt #35 & #36.=A0 Timeoute is
> doubled to
> >> 2*RTO; The client immediately sends two Dup-ack to
> #35 and
> >> #36.
> 
> Server is still in slow start mode, and extend window to
> 2.
> 
> >> > 5. after 2*RTO, pkt #15 is retransmitted in
> #39.
> 
> Here , the second retransmission timer expiring ocur.
> Server's sending
> window reduce to one again and continue in slow start
> mode.
> 
> >> > 6.. The client immediately acks #39 (=#15) in
> #40, and
> >> the server continues to expand the window and
> sends two
> >> *NEW* pkt #41 & #42. Now the timeoute is
> doubled to 4
> >> *RTO.
> Here you ignore two duplicate acks #37 and #38 sent by the
> client. As I know
> the server must receive three or even more duplcate acks
> before it enter fast
> retransmit mode, otherwise it will still in slow start mode
> and  it
> will wait until next
> time retransmission timer expiring before retransmit the
> lost packets.
> And this is
> actually what you got.
> 
> I'm not an kernel expert, I just analyse from the TCP
> protocol standard. From my
> view, I think there is no problem in the server's network
> stack. But
> there maybe
> some problem in the client (or some intermediate network
> appliance) side, as it
> always just sends two duplicate acks at the same time, and
> never send the third
> one no matter how long the interval is. In my opinion, if
> the client
> can send the third
> duplicate acks then the server will enter fast retransmit
> mode and
> then fast recovery
> then every thing will be ok.
> 
> >> > 8. After 4*RTO timeout, #16 is
> retransmitted.
> >> > 9....
> >> > 10. The above steps repeats for
> retransmitting pkt
> >> #16-#32 and each time the timeout is doubled.
> >> > 11. It takes a long long time to retransmit
> all the
> >> lost packets and before that is done, the client
> sends a RST
> >> because of timeout.
> 
> On Fri, Sep 25, 2009 at 2:42 PM, Joe Cao <caoco2002@...oo.com>
> wrote:
> > Hi,
> >
> > On the wrong tcp checksum, that's because of hardware
> checksum offload.
> >
> > As for the seq/ack number, because the trace is long,
> I deliberately removed those irrelevant packets between
> after the three-way handshake and when the problem happens.
>  That can be seen from the timestamps.
> >
> > Please also note that I intentionally replaced the IP
> addresses and mac addresses in the trace to hide proprietary
> information in the trace.
> >
> > Anyway, the problem is not related to the checksum, or
> seq/ack number, otherwise, you won't see the behavior shown
> in the trace.
> >
> > Thanks,
> > Joe
> >
> > --- On Thu, 9/24/09, zhigang gong <zhigang.gong@...il.com>
> wrote:
> >
> 


      

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html