netdev - Re: [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1418135209.14835.17.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Tue, 09 Dec 2014 06:26:49 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Thomas Jarosch <thomas.jarosch@...ra2net.com>
Cc:	Wolfgang Walter <linux@...m.de>, netdev@...r.kernel.org,
	Eric Dumazet <edumazet@...gle.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Steffen Klassert <steffen.klassert@...unet.com>
Subject: Re: [bisected] xfrm: TCP connection initiating PMTU discovery
 stalls on v3.

On Tue, 2014-12-09 at 09:54 +0100, Thomas Jarosch wrote:
> On Monday, 8. December 2014 23:20:42 Wolfgang Walter wrote:
> > Am Freitag, 5. Dezember 2014, 05:26:25 schrieb Eric Dumazet:
> > > On Fri, 2014-12-05 at 13:09 +0100, Wolfgang Walter wrote:
> > > > Hello,
> > > > 
> > > > as reverting this patch fixes this rather annoying problem: is it
> > > > dangerous to revert it as a workaround until the root cause is found?
> > > 
> > > Unfortunately no, this patch fixes a serious issue.
> > > 
> > > We need to find the root cause of your problem instead of trying to work
> > > around it.
> > 
> > I only wanted to use it as local workaround here.
> > 
> > 
> > I looked a bit at at code. I'm not familiar with the network code, though
> > :-).
> 
> If it helps, I'm running the reverted patch on five production boxes hitherto 
> without a hiccup. As far as I understood the original commit message,
> some packet counters might me wrong without it.
> 
> @Eric: What could possibly go wrong(tm)? :)

Crashes in TCP stack, because of packet count mismatches.

The sk_can_gso() status is already tested in tcp_sendmsg() as a hint,
since path behavior can dynamically be changed on existing flow :

<start a TCP flow>
ethtool -K eth0 tso off gso off

In this case, core networking stack detects this and segments the
packets _after_ TCP or IP stack, before they reach eth0.

TCP stack does not have to know that something is changed right before
giving a GSO packet to core networking stack, this would be racy by
nature, as TCP does not know or control full path. Hopefully we do not
take RTNL for every packet we send in TCP !

It seems XFRM triggers in a slow path something which is not correctly
handled.

It is not correct to add a racy kludge in TCP fast path for this very
unlikely case.

I would disable TSO/GSO on xfrm, and problem should disappear.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html