[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190715161827.GB2622@TopQuark.net>
Date: Mon, 15 Jul 2019 12:18:27 -0400
From: Paul Donohue <linux-kernel@...lSD.com>
To: David Ahern <dsahern@...il.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
netdev@...r.kernel.org
Subject: IPv6 L2TP issues related to 93531c67
I have a system that establishes four L2TP over IPv6 tunnels using site-local addresses via the following:
ip l2tp add tunnel tunnel_id 1233 peer_tunnel_id 1233 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:3
ip l2tp add session name net_l2tp1 tunnel_id 1233 session_id 1233 peer_session_id 1233
ip link set dev net_l2tp1 up
ip l2tp add tunnel tunnel_id 1235 peer_tunnel_id 1235 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:2
ip l2tp add session name net_l2tp2 tunnel_id 1235 session_id 1235 peer_session_id 1235
ip link set dev net_l2tp2 up
ip l2tp add tunnel tunnel_id 2233 peer_tunnel_id 2233 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:3
ip l2tp add session name net_l2tp3 tunnel_id 2233 session_id 2233 peer_session_id 2233
ip link set dev net_l2tp3 up
ip l2tp add tunnel tunnel_id 2235 peer_tunnel_id 2235 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:2
ip l2tp add session name net_l2tp4 tunnel_id 2235 session_id 2235 peer_session_id 2235
ip link set dev net_l2tp4 up
These tunnels worked fine on kernel 4.4. On kernel 4.15, there was a bug that caused intermittent L2TP packet errors, but everything worked fine after applying 4522a70db7aa5e77526a4079628578599821b193.
However, after upgrading to kernel 4.18 with 4522a70d (or upgrading to kernel 5.0 which includes 4522a70d, or upgrading to the current master kernel branch), two of the four tunnels always fail to work properly after a reboot, although it appears random which two work and which two fail.
When I say "fail to work properly", the problem is that packets generated by the l2tp kernel modules (in response to a packet being sent to the associated net_l2tpX interface) are silently dropped. The l2tp_debugfs kernel module reports that L2TP packets are being transmitted with no errors, iptables counters and nflog rules can be used to confirm that well-formed packets are generated and sent, but tcpdump does not see the packets being sent on any interface on the system. iptables reports that the destination interface of the lost packets is "lo" (which is clearly incorrect and probably an indicator of the underlying issue), but `tcpdump -nnn -i lo` doesn't show any packets. Incoming L2TP packets appear to be processed correctly, only outgoing L2TP packets appear affected.
Reverting commit 93531c6743157d7e8c5792f8ed1a57641149d62c (identified by bisection) fixes this issue.
IPv4 L2TP tunnels do not appear affected by this issue. Based on a few quick tests, it appears that switching to publicly-routable IPv6 addresses instead of site-local addresses seems to prevent this issue, although I haven't done sufficient testing of this, and it is not clear to me how the code in 93531c67 might be affected by the type of IPv6 address, so this observation may be a red herring. Manually deleting and re-creating a broken interface seems to make it work again, although I have not thoroughly experimented with making changes after boot time to see if the problem is entirely random, if it is based on the number of existing interfaces, if it is based on a boot-time timing issue, etc.
It is not obvious to me how commit 93531c6743157d7e8c5792f8ed1a57641149d62c causes this issue, or how it should be fixed. Could someone take a look and point me in the right direction for further troubleshooting?
Thanks!
Powered by blists - more mailing lists