linux-kernel - [PATCH RFC net] net: Prevent sk_bound_dev_if causing packet to be rerouted back into tunnel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250415045051.1913231-1-Thomas.Winter@alliedtelesis.co.nz>
Date: Tue, 15 Apr 2025 16:50:51 +1200
From: Thomas Winter <Thomas.Winter@...iedtelesis.co.nz>
To: steffen.klassert@...unet.com,
	herbert@...dor.apana.org.au,
	davem@...emloft.net,
	dsahern@...nel.org,
	edumazet@...gle.com,
	kuba@...nel.org,
	pabeni@...hat.com,
	horms@...nel.org,
	netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Cc: Thomas Winter <Thomas.Winter@...iedtelesis.co.nz>
Subject: [PATCH RFC net] net: Prevent sk_bound_dev_if causing packet to be rerouted back into tunnel

We have found a situation where packets going into an IPsec tunnel get
encapsulated twice. For example, an icmp socket using SO_BINDTODEVICE
of a tunnel and some mangle rules to implement policy based routing.
After the first ESP encapsulation and running through the mangle table
again, a difference in skb->mark causes ip_route_me_harder to be called
but skb->sk->sk_bound_dev_if is still the tunnel. This causes the ESP
packet to get routed back into the tunnel and get xfrm'd again using
the same SA. The double encapsulated is then routed correctly out the
physical interface.

With a xfrmi interface on the other side, it was dropping the packet
with LINUX_MIB_XFRMINTMPLMISMATCH. A ipvti interface would accept it.
However the transmitting side should not have been doing the double
ESP encapsulation in the first place.

A potential fix for this is to drop the reference to skb->sk using
skb_orphan before transmission. scrub_packet would do this but only
if the packet is traversing namespaces. This allows ip_route_me_harder
to select the correct route for the ESP packet without getting fooled
by a sk_bound_dev_if of itself and get forwarded out the physical
interface.

Signed-off-by: Thomas Winter <Thomas.Winter@...iedtelesis.co.nz>
---
 net/ipv4/ip_vti.c              | 1 +
 net/ipv6/ip6_vti.c             | 1 +
 net/xfrm/xfrm_interface_core.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 159b4473290e..096e9b51816f 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -260,6 +260,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
 	skb_scrub_packet(skb, !net_eq(tunnel->net, dev_net(dev)));
 	skb_dst_set(skb, dst);
 	skb->dev = skb_dst(skb)->dev;
+	skb_orphan(skb);

 	err = dst_output(tunnel->net, skb->sk, skb);
 	if (net_xmit_eval(err) == 0)
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 09ec4b0ad7dc..d1d5bbaa3d6d 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -530,6 +530,7 @@ vti6_xmit(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 	skb_scrub_packet(skb, !net_eq(t->net, dev_net(dev)));
 	skb_dst_set(skb, dst);
 	skb->dev = skb_dst(skb)->dev;
+	skb_orphan(skb);

 	err = dst_output(t->net, skb->sk, skb);
 	if (net_xmit_eval(err) == 0)
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c
index 622445f041d3..17b26409e6a0 100644
--- a/net/xfrm/xfrm_interface_core.c
+++ b/net/xfrm/xfrm_interface_core.c
@@ -504,6 +504,7 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 	xfrmi_scrub_packet(skb, !net_eq(xi->net, dev_net(dev)));
 	skb_dst_set(skb, dst);
 	skb->dev = tdev;
+	skb_orphan(skb);

 	err = dst_output(xi->net, skb_to_full_sk(skb), skb);
 	if (net_xmit_eval(err) == 0) {
-- 
2.49.0