netdev - Re: [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.12+

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1841985.3SsQX9ZHP0@h2o.as.studentenwerk.mhn.de>
Date:	Fri, 05 Dec 2014 13:09:59 +0100
From:	Wolfgang Walter <linux@...m.de>
To:	netdev@...r.kernel.org
Cc:	Thomas Jarosch <thomas.jarosch@...ra2net.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Steffen Klassert <steffen.klassert@...unet.com>
Subject: Re: [bisected] xfrm: TCP connection initiating PMTU discovery stalls on v3.12+

Hello,

as reverting this patch fixes this rather annoying problem: is it dangerous to 
revert it as a workaround until the root cause is found?


Am Montag, 1. Dezember 2014, 17:41:23 schrieb Wolfgang Walter:
> Am Montag, 1. Dezember 2014, 14:17:28 schrieb Wolfgang Walter:
> > Am Samstag, 29. November 2014, 12:44:07 schrieb Thomas Jarosch:
> > > Hello,
> > > 
> > > we're in the process of updating production level machines
> > > from kernel 3.4.101 to kernel 3.14.25. On one mail server
> > > we noticed that emails destined for an IPSec tunnel sometimes
> > > get stuck in the mail queue with TCP timeouts.
> > > 
> > > To make a long story short: When the VPN connection is initially
> > > set up or re-newed, the path MTU for the xfrm tunnel is undetermined.
> > > 
> > > As soon as a TCP client starts to send large packets,
> > > it triggers path MTU detection. Some middlebox on the
> > > way to the final server has a lower MTU and sends back
> > > an "ICMP fragmentation needed" packet as normal.
> > > 
> > > With the old kernel, the packet size for the TCP connection inside
> > > the xfrm tunnel gets adjusted and all is fine. With kernel v3.12+,
> > > the connection stalls completely. Same thing with kernel v3.18-rc6.
> > 
> > We see something similar with real nic (RTL8139). In our case only the
> > first tcp-connection which triggers PMTU stalls. Later tcp-connections
> > then work fine.
> > 
> > I will revert that patch and see if that fixes the problem.
> 
> Reverting the commit fixes the problem here, too.
> 
> > > We wrote a small tool to mimic postfix's TCP behavior (see attached
> > > file).
> > > In the end it's a normal TCP client sending large packets.
> > > The server side is just "socat - tcp4-listen:667".
> > > 
> > > If you run "socket_client" a second time, the path MTU
> > > for the xfrm tunnel is already known and packets flow normal, too.
> > > 
> > > 
> > > The "evil" commit in question is this one:
> > > ---------------------------------------------------------------------
> > > commit 8f26fb1c1ed81c33f5d87c5936f4d9d1b4118918
> > > Author: Eric Dumazet <edumazet@...gle.com>
> > > Date:   Tue Oct 15 12:24:54 2013 -0700
> > > 
> > >     tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs()
> > >     
> > >     sk_can_gso() should only be used as a hint in tcp_sendmsg() to build
> > >     GSO
> > > 
> > > packets in the first place. (As a performance hint)
> > > 
> > >     Once we have GSO packets in write queue, we can not decide they are
> > >     no
> > >     longer GSO only because flow now uses a route which doesn't handle
> > >     TSO/GSO.
> > >     
> > >     Core networking stack handles the case very well for us, all we need
> > >     is keeping track of packet counts in MSS terms, regardless of
> > >     segmentation done later (in GSO or hardware)
> > >     
> > >     Right now, if  tcp_fragment() splits a GSO packet in two parts,
> > >     @left and @right, and route changed through a non GSO device,
> > >     both @left and @right have pcount set to 1, which is wrong,
> > >     and leads to incorrect packet_count tracking.
> > >     
> > >     This problem was added in commit d5ac99a648 ("[TCP]: skb pcount with
> > >     MTU
> > > 
> > > discovery")
> > > 
> > >     Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > >     Signed-off-by: Neal Cardwell <ncardwell@...gle.com>
> > >     Signed-off-by: Yuchung Cheng <ycheng@...gle.com>
> > >     Reported-by: Maciej Żenczykowski <maze@...gle.com>
> > >     Signed-off-by: David S. Miller <davem@...emloft.net>
> > > 
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index 8fad1c1..d46f214 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -989,8 +989,7 @@ static void tcp_set_skb_tso_segs(const struct sock
> > > *sk,
> > > struct sk_buff *skb, /* Make sure we own this skb before messing
> > > gso_size/gso_segs */ WARN_ON_ONCE(skb_cloned(skb));
> > > 
> > > -       if (skb->len <= mss_now || !sk_can_gso(sk) ||
> > > -           skb->ip_summed == CHECKSUM_NONE) {
> > > +       if (skb->len <= mss_now || skb->ip_summed == CHECKSUM_NONE) {
> > > 
> > >                 /* Avoid the costly divide in the normal
> > >                 
> > >                  * non-TSO case.
> > >                  */
> > > 
> > > ---------------------------------------------------------------------
> > > 
> > > When I revert it, even kernel v3.18-rc6 starts working.
> > > But I doubt this is the root problem, may be just hiding another issue.
> > > 
> > > --- Sample output of socket_client using vanilla v3.12 kernel ---
> > > [1417258063 SEND result: 4096, strerror: Success]
> > > tcp max seg: res: 0, max_seg: 1370
> > > [1417258063 SEND result: 4096, strerror: Success]
> > > tcp max seg: res: 0, max_seg: 1370
> > > [1417258063 SEND result: 4096, strerror: Success]
> > > tcp max seg: res: 0, max_seg: 1370
> > > [1417258063 SEND result: 4096, strerror: Success]
> > > tcp max seg: res: 0, max_seg: 1370
> > > [1417258063 SEND result: 4096, strerror: Success]
> > > tcp max seg: res: 0, max_seg: 1338
> > > [1417258063 SEND result: 4096, strerror: Success]
> > > tcp max seg: res: 0, max_seg: 1338
> > > *STUCK*
> > > --------------------------------------------------------
> > > 
> > > The "machine" is running on KVM and using "virtio_net" as NIC driver.
> > > I've played with the ethtool offload settings:
> > > 
> > > *** eth1 defaults ***
> > > Offload parameters for eth1:
> > > rx-checksumming: on
> > > tx-checksumming: on
> > > scatter-gather: on
> > > tcp-segmentation-offload: on
> > > udp-fragmentation-offload: on
> > > generic-segmentation-offload: on
> > > generic-receive-offload: on
> > > large-receive-offload: off
> > > 
> > > *** eth1 working (no stalls) using vanilla kernel ***
> > > Offload parameters for eth1:
> > > rx-checksumming: on
> > > tx-checksumming: off  <-- the magic switch
> > > scatter-gather: off
> > > tcp-segmentation-offload: off
> > > udp-fragmentation-offload: off
> > > generic-segmentation-offload: off
> > > generic-receive-offload: off
> > > large-receive-offload: off
> > > 
> > > When I turn "tx-checksumming" back on, it fails again.
> > > Though that is probably also just a side effect.
> > > 
> > > I can provide tcpdumps if needed but they are no real help
> > > since you can just see the kernel stops sending TCP packets.
> > > (and the outgoing TCP packets are encrypted in ESP packets)
> > > 
> > > 
> > > Any vague idea what might be the root cause?
> > > 
> > > I also tried reverting commit 4d53eff48b5f03ce67f4f301d6acca1d2145cb7a
> > > ("xfrm: Don't queue retransmitted packets if the original is still on
> > > the
> > > host") but that didn't change the situation. In fact it wasn't even
> > > triggered.
> > > 
> > > Please CC: comments. Thanks.
> > > 
> > > Best regards,
> > > Thomas
> > 
> > Regards,
> 
> Regards,

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html