[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <98348818-28c5-4cb2-556b-5061f77e112c@arcanite.ch>
Date: Wed, 28 Sep 2022 16:02:43 +0200
From: Maximilien Cuony <maximilien.cuony@...anite.ch>
To: "David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [REGRESSION] Unable to NAT own TCP packets from another VRF with
tcp_l3mdev_accept = 1
Hello,
We're using VRF with a machine used as a router and have a specific
issue where the router doesn't handle his own packets correctly during
NATing if the packet is coming from a different VRF.
We had the issue with debian buster (4.19), but the issue solved itself
when we updated to debian bullseye (5.10.92).
However, during an upgrade of debian bullseye to the latest kernel, the
issue appeared again (5.10.140).
We did a bisection and this leaded us to
"b0d67ef5b43aedbb558b9def2da5b4fffeb19966 net: allow unbound socket for
packets in VRF when tcp_l3mdev_accept set [ Upstream commit
944fd1aeacb627fa617f85f8e5a34f7ae8ea4d8e ]".
Simplified case setup:
There is two machines in the setup. They both forward packets
(net.ipv4.ip_forward = 1) and there is two interface between them.
The main machine has two VRF. The default VRF is using the second
machine as the default route, on a specific interface.
The second machine has as default route to main machine, on the other
VRF using the second pair of interfaces.
On the main machine, the second interface is in a specific VRF. In that
VRF, packets are NATed to the internet on a third interface.
A visual schema with the normal flow is available there:
https://etinacra.ch/kernel.png
Configuration command:
Main machine:
sysctl -w net.ipv4.tcp_l3mdev_accept = 1
sysctl -w systnet.ipv4.ip_forward = 1
iptables -t raw -A PREROUTING -i eth0 -j CT --zone 5
iptables -t raw -A OUTPUT -o eth0 -j CT --zone 5
iptables -t nat -A POSTROUTING -o eth2 -j SNAT --to 192.168.1.1
cat /etc/network/interfaces
auto firewall
iface firewall
vrf-table 1200
auto eth0
iface eth0
address 192.168.5.1/24
gateway 192.168.5.2
auto eth1
iface eth1
address 192.168.10.1/24
vrf firewall
up ip route add 192.168.5.0/24 via 192.168.10.2 vrf firewall
auto eth2
iface eth2
address 192.168.1.1/24
gateway 192.168.1.250
vrf firewall
==
Second machine:
sysctl -w net.ipv4.ip_forward = 1
cat /etc/network/interfaces
auto eth0
iface eth0
address 192.168.5.2/24
auto eth1
iface eth1
address 192.168.10.2/24
gateway 192.168.10.1
==
Without issue, if we look at a tcpdump on all interface on the main
machine, everything is fine (output truncated):
10:28:32.811283 eth0 Out IP 192.168.5.1.55750 > 99.99.99.99.80: Flags
[S], seq 2216112145
10:28:32.811666 eth1 In IP 192.168.5.1.55750 > 99.99.99.99.80: Flags
[S], seq 2216112145
10:28:32.811679 eth2 Out IP 192.168.1.1.55750 > 99.99.99.99.80: Flags
[S], seq 2216112145
10:28:32.835138 eth2 In IP 99.99.99.99.80 > 192.168.1.1.55750: Flags
[S.], seq 383992840, ack 2216112146
10:28:32.835152 eth1 Out IP 99.99.99.99.80 > 192.168.5.1.55750: Flags
[S.], seq 383992840, ack 2216112146
10:28:32.835457 eth0 In IP 99.99.99.99.80 > 192.168.5.1.55750: Flags
[S.], seq 383992840, ack 2216112146
10:28:32.835511 eth0 Out IP 192.168.5.1.55750 > 99.99.99.99.80: Flags
[.], ack 1, win 502
However when the issue is present, the SYNACK does arrives on eth2, but
is never "unNATed" back to eth1:
10:25:07.644433 eth0 Out IP 192.168.5.1.48684 > 99.99.99.99.80: Flags
[S], seq 3207393154
10:25:07.644782 eth1 In IP 192.168.5.1.48684 > 99.99.99.99.80: Flags
[S], seq 3207393154
10:25:07.644793 eth2 Out IP 192.168.1.1.48684 > 99.99.99.99.80: Flags
[S], seq 3207393154
10:25:07.668551 eth2 In IP 54.36.61.42.80 > 192.168.1.1.48684: Flags
[S.], seq 823335485, ack 3207393155
The issue is only with TCP connections. UDP or ICMP works fine.
Turing off net.ipv4.tcp_l3mdev_accept back to 0 also fix the issue, but
we need this flag since we use some sockets that does not understand VRFs.
We did have a look at the diff and the code of inet_bound_dev_eq, but we
didn't understand much the real problem - but it does seem now that
bound_dev_if if now checked not to be False before the bound_dev_if ==
dif || bound_dev_if == sdif comparison, something that was not the case
before (especially since it's dependent on l3mdev_accept).
Maybe our setup is wrong and we should not be able to route packets like
that?
Thanks a lot and have a nice day!
Maximilien Cuony
Powered by blists - more mailing lists