lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 15 Sep 2021 11:45:36 +0200 From: Daniel Borkmann <daniel@...earbox.net> To: Lukas Wunner <lukas@...ner.de> Cc: Pablo Neira Ayuso <pablo@...filter.org>, Jozsef Kadlecsik <kadlec@...filter.org>, Florian Westphal <fw@...len.de>, netfilter-devel@...r.kernel.org, coreteam@...filter.org, netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>, Eric Dumazet <edumazet@...gle.com>, Thomas Graf <tgraf@...g.ch>, Laura Garcia Liebana <nevola@...il.com>, John Fastabend <john.fastabend@...il.com> Subject: Re: [PATCH nf-next v4 4/5] netfilter: Introduce egress hook Hi Lukas, On 9/11/21 11:26 PM, Lukas Wunner wrote: > On Tue, Jan 26, 2021 at 08:13:19PM +0100, Daniel Borkmann wrote: >> On 1/22/21 9:47 AM, Lukas Wunner wrote: [...] >> Wrt above objection, what needs to be done for the minimum case is to >> i) fix the asymmetry problem from here, and >> ii) add a flag for tc layer-redirected skbs to skip the nf egress hook *; >> with that in place this concern should be resolved. Thanks! > > Daniel, your reply above has left me utterly confused. After thinking > about it for a while, I'm requesting clarification what you really want. > We do want the netfilter egress hook in mainline and we're happy to > accommodate to your requirements, but we need you to specify them clearly. > > Regarding the data path for packets which are going out from a container, > I think you've erroneously mixed up the last two elements above: > If the nf egress hook is placed after tc egress (as is done in the present > patch), the data path is actually as follows: > > [tc egress (veth,podns)] -> [tc ingress (veth,hostns)] -> > [tc egress (phys,hostns)] -> [nf egress (phys,hostns)*] > > By contrast, if nf egress is put in front of tc egress (as you're > requesting above), the data path becomes: > > [nf egress (veth,podns)] -> [tc egress (veth,podns)] -> > [tc ingress (veth,hostns)] -> > [nf egress (phys,hostns)*] -> [tc egress (phys,hostns)] > > So this order results in an *additional* nf egress hook in the data path, > which may cost performance. Previously you voiced concerns that the > nf egress hook may degrade performance. To address that concern, > we ordered nf egress *after* tc egress, thereby avoiding any performance > impact as much as possible. > > I agree with your argument that an inverse order vis-a-vis ingress is > more logical because it avoids NAT incongruencies etc. So I'll be happy > to reorder nf egress before tc egress. I'm just confused that you're now > requesting an order which may be *more* detrimental to performance. Right, the issue is that placing it either in front or after the existing tc egress hook just as-is results in layering violations given both tc/nf subsystems will inevitably step on each other when both dataplanes are used by different components where things break. To provide a walk-through on what I think would work w/o breaking layers: 1) Packet going up hostns stack. Here, we'll end up with: [tc ingress (phys,hostns)] -> [nf ingress (phys,hostns)] -> upper stack -> [nf egress (phys,hostns)] -> [tc egress (phys,hostns)] 2) Packet going up podns stack. Here, if the forwarding happens entirely in tc layer, then the datapath (as-is today) operates _below_ nf and must look like: [tc ingress (phys,hostns)] -> [tc egress (veth,hostns)] -> netns switch -> (*) -> netns switch -> [tc ingress (veth,hostns)] -> [tc egress (phys,hostns)] For simplicity, I left out (*), but it's essentially the same as case 1) just for the Pod's/container's stack PoV. 3) Packet is locally forwarded between Pods. Same as 2) for the case where the forwarding happens entirely in tc (as-is today) which operates _below_ nf and must look like: (+) -> [tc ingress (veth,hostns)] -> [tc egress (veth,hostns)] -> netns switch -> (*) -> netns switch -> [tc ingress (veth,hostns)] -> [tc egress (veth,hostns)] -> (+) The (+) denotes the src/sink where we enter/exit the hostns after netns switch. The skb bit marker would be that tc [& BPF]-related redirect functions are internally setting that bit, so that nf egress hook is skipped for cases like 2)/3). Walking through a similar 1/2/3) scenario from nf side one layer higher if you were to do an equivalent, things would look like: 1) Packet going up hostns stack. Same as above: [tc ingress (phys,hostns)] -> [nf ingress (phys,hostns)] -> upper stack -> [nf egress (phys,hostns)] -> [tc egress (phys,hostns)] 2) Packet going up podns stack with forwarding from nf side: [tc ingress (phys,hostns)] -> [nf ingress (phys,hostns)] -> [nf egress (veth,hostns)] -> [tc egress (veth,hostns)] -> netns switch -> (*) -> netns switch -> [tc ingress (veth,hostns)] -> [nf ingress (veth,hostns)] -> [nf egress (phys,hostns)] -> [tc egress (phys,hostns)] 3) Packet is locally forwarded between Pods with forwarding from nf side: (+) -> [tc ingress (veth,hostns)] -> [nf ingress (veth,hostns)] -> [nf egress (veth,hostns)] -> [tc egress (veth,hostns)] -> netns switch -> (*) -> netns switch -> [tc ingress (veth,hostns)] -> [nf ingress (veth,hostns)] -> [nf egress (veth,hostns)] -> [tc egress (veth,hostns)] -> (+) Thanks, Daniel
Powered by blists - more mailing lists