lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c5909e04-35c7-4775-bd17-e17115037792@average.org>
Date: Tue, 24 Jun 2025 17:27:23 +0200
From: Eugene Crosser <crosser@...rage.org>
To: nicolas.dichtel@...nd.com, netdev@...r.kernel.org
Cc: "netfilter-devel@...r.kernel.org" <netfilter-devel@...r.kernel.org>,
 David Ahern <dsahern@...nel.org>, Florian Westphal <fw@...len.de>,
 Pablo Neira Ayuso <pablo@...filter.org>
Subject: Re: When routed to VRF, NF _output_ hook is run unexpectedly

On 20/06/2025 18:20, Nicolas Dichtel wrote:

>>>> It is possible, and very useful, to implement "two-stage routing" by
>>>> installing a route that points to a VRF device:
>>>>
>>>>     ip link add vrfNNN type vrf table NNN
>>>>     ...
>>>>     ip route add xxxxx/yy dev vrfNNN
>>>>
>>>> however this causes surprising behaviour with relation to netfilter
>>>> hooks. Namely, packets taking such path traverse _output_ nftables
>>>> chain, with conntracking information reset. So, for example, even
>>>> when "notrack" has been set in the prerouting chain, conntrack entries
>>>> will still be created. Script attached below demonstrates this behaviour.
>>> You can have a look to this commit to better understand this:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c9c296adfae9
>>
>> I've seen this commit.
>> My point is that the packets are _not locally generated_ in this case,
>> so it seems wrong to pass them to the _output_ hook, doesn't it?
> They are, from the POV of the vrf. The first route sends packets to the vrf
> device, which acts like a loopback.

I see, this explains the behaviour that I observe.
I believe that there are two problems here though:

1. This behaviour is _surprising_. Packets are not really "locally
generated", they come from "outside", but treated as is they were
locally generated. In my view, it deserves an section in
Documentation/networking/vrf.rst (see suggestion below).

2. Using "output" hook makes it impossible(?) to define different
nftables rules depending on what vrf was used for routing (because iif
is not accessible in the "output" chain). For example, traffic from
different tenants, that is routed via different VRFs but egress over the
same uplink interface, cannot be assigned different zones. Conntrack
entries of different tenants will be mixed. As another example, one
cannot disable conntracking of tenant's traffic while continuing to
track "true output" traffic from he processes running on the host.

Thanks for consideration,

Eugene

========================
Suggested update to the documentation:

diff --git a/Documentation/networking/vrf.rst
b/Documentation/networking/vrf.rst
index 0a9a6f968cb9..74c6a69355df 100644
--- a/Documentation/networking/vrf.rst
+++ b/Documentation/networking/vrf.rst
@@ -61,6 +61,11 @@ domain as a whole.
        the VRF device. For egress POSTROUTING and OUTPUT rules can be
written
        using either the VRF device or real egress device.

+.. [3] When a packet is forwarded to a VRF interface, it gets further
+       routed according to the route table associated with the VRF, but
+       processed by the "output" netfilter hook instead of "forwarding"
+       hook.
+
 Setup
 -----
 1. VRF device is created with an association to a FIB table.


Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ