netdev - Re: [PATCH nf-next v4 4/5] netfilter: Introduce egress hook

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210911212650.GA17551@wunner.de>
Date:   Sat, 11 Sep 2021 23:26:50 +0200
From:   Lukas Wunner <lukas@...ner.de>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Pablo Neira Ayuso <pablo@...filter.org>,
        Jozsef Kadlecsik <kadlec@...filter.org>,
        Florian Westphal <fw@...len.de>,
        netfilter-devel@...r.kernel.org, coreteam@...filter.org,
        netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Thomas Graf <tgraf@...g.ch>,
        Laura Garcia Liebana <nevola@...il.com>,
        John Fastabend <john.fastabend@...il.com>
Subject: Re: [PATCH nf-next v4 4/5] netfilter: Introduce egress hook

On Tue, Jan 26, 2021 at 08:13:19PM +0100, Daniel Borkmann wrote:
> On 1/22/21 9:47 AM, Lukas Wunner wrote:
> > Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
> > handle_ing() under unique static key") introduced the ability to
> > classify packets with netfilter on ingress.
> > 
> > Support the same on egress to satisfy user requirements such as:
[...]
> > The hook is positioned after packet handling by traffic control.
> > Thus, if packets are redirected into and out of containers with tc,
> > the data path is:
> > ingress: host tc -> container tc -> container nft
> > egress:  container tc -> host tc -> host nft
> > 
> > This was done to address an objection from Daniel Borkmann:  If desired,
> > nft does not get into tc's way performance-wise.  The host is able to
> > firewall malicious packets coming out of a container, but only after tc
> > has done its duty.  An implication is that tc may set skb->mark on
> > egress for nft to act on it, but not the other way round.
> [...]
> > @@ -4096,13 +4098,18 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >   	qdisc_pkt_len_init(skb);
> >   #ifdef CONFIG_NET_CLS_ACT
> >   	skb->tc_at_ingress = 0;
> > -# ifdef CONFIG_NET_EGRESS
> > +#endif
> > +#ifdef CONFIG_NET_EGRESS
> >   	if (static_branch_unlikely(&egress_needed_key)) {
> >   		skb = sch_handle_egress(skb, &rc, dev);
> >   		if (!skb)
> >   			goto out;
> > +		if (nf_hook_egress_active()) {
> > +			skb = nf_hook_egress(skb, &rc, dev);
> > +			if (!skb)
> > +				goto out;
> > +		}
> 
> This won't work unfortunately, the issue is that this now creates an
> asymmetric path, for example:
>   [tc ingress] -> [nf ingress] -> [host] -> [tc egress] -> [nf egress].
> So any NAT translation done on tc ingress + tc egress will break on
> the nf hooks given nf is not able to correlate inbound with outbound
> traffic.
> 
> Likewise for container-based traffic that is in its own netns,
> one of the existing paths is:
>   [tc ingress (phys,hostns)] -> [tc ingress (veth,podns)] -> [reply] ->
>   [tc egress (veth,podns)] -> [tc ingress (veth,hostns)] ->
>   [nf egress (phys,hostns)*] -> [tc egress (phys,hostns)].
> As can be seen the [nf ingress] hook is not an issue at all given
> everything operates in tc layer but the [nf egress*] one is in this case,
> as it touches in tc layer where existing data planes will start to break
> upon rule injection.
> 
> Wrt above objection, what needs to be done for the minimum case is to
> i) fix the asymmetry problem from here, and
> ii) add a flag for tc layer-redirected skbs to skip the nf egress hook *;
> with that in place this concern should be resolved. Thanks!

Daniel, your reply above has left me utterly confused.  After thinking
about it for a while, I'm requesting clarification what you really want.
We do want the netfilter egress hook in mainline and we're happy to
accommodate to your requirements, but we need you to specify them clearly.

Regarding the data path for packets which are going out from a container,
I think you've erroneously mixed up the last two elements above:
If the nf egress hook is placed after tc egress (as is done in the present
patch), the data path is actually as follows:

  [tc egress (veth,podns)] -> [tc ingress (veth,hostns)] ->
  [tc egress (phys,hostns)] -> [nf egress (phys,hostns)*]

By contrast, if nf egress is put in front of tc egress (as you're
requesting above), the data path becomes:

  [nf egress (veth,podns)] -> [tc egress (veth,podns)] ->
  [tc ingress (veth,hostns)] ->
  [nf egress (phys,hostns)*] -> [tc egress (phys,hostns)]

So this order results in an *additional* nf egress hook in the data path,
which may cost performance.  Previously you voiced concerns that the
nf egress hook may degrade performance.  To address that concern,
we ordered nf egress *after* tc egress, thereby avoiding any performance
impact as much as possible.

I agree with your argument that an inverse order vis-a-vis ingress is
more logical because it avoids NAT incongruencies etc.  So I'll be happy
to reorder nf egress before tc egress.  I'm just confused that you're now
requesting an order which may be *more* detrimental to performance.

As a reminder:
- If no nf egress rules are in use on the system, there is no performance
  impact at all because the hook is patched out of the data path.
- If an interface on the system uses nf egress rules, all other interfaces
  suffer a minor performance impact because a NULL pointer check is
  performed for sk_buff->nf_hooks_egress for every packet.
- The actual netfilter processing only happens on the interface(s)
  for which nf egress rules have been configured.

You further request above to "add a flag for tc layer-redirected skbs
to skip the nf egress hook *".

However the data path *starts* with an nf egress hook.  So even if a flag
to skip subsequent nf egress processing is set upon tc redirect,
the initial nf egress hook is never skipped.  The benefit of such a flag
is therefore questionable, which is another source of confusion to me.

Again, please clarify what your concerns are and how they can be addressed.

Thanks,

Lukas