[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9182f7b2-da9c-765e-028f-8b9e5c5d4716@iogearbox.net>
Date: Fri, 16 Feb 2018 21:44:01 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Florian Westphal <fw@...len.de>
Cc: netdev@...r.kernel.org, netfilter-devel@...r.kernel.org,
davem@...emloft.net, alexei.starovoitov@...il.com
Subject: Re: [PATCH RFC 0/4] net: add bpfilter
Hi Florian,
On 02/16/2018 05:14 PM, Florian Westphal wrote:
> Florian Westphal <fw@...len.de> wrote:
>> Daniel Borkmann <daniel@...earbox.net> wrote:
>> Several questions spinning at the moment, I will probably come up with
>> more:
>
> ... and here there are some more ...
>
> One of the many pain points of xtables design is the assumption of 'used
> only by sysadmin'.
>
> This has not been true for a very long time, so by now iptables has
> this userspace lock (yes, its fugly workaround) to serialize concurrent
> iptables invocations in userspace.
>
> AFAIU the translate-in-userspace design now brings back the old problem
> of different tools overwriting each others iptables rules.
Right, so the behavior would need to be adapted to be exactly the same,
given all the requests go into kernel space first via the usual uapis,
I don't think there would be anything in the way of keeping that as is.
> Another question -- am i correct in that each rule manipulation would
> incur a 'recompilation'? Or are there different mini programs chained
> together?
Right now in the PoC yes, basically it regenerates the program on the fly
in gen.c when walking the struct bpfilter_ipt_ip's and appends the entries
to the program, but it doesn't have to be that way. There are multiple
options to allow for a partial code generation, e.g. via chaining tail
call arrays or directly via BPF to BPF calls eventually, there would be
few changes on BPF side needed, but it can be done; there could additionally
be various optimizations passes during code generation phase performed
while keeping given constraints in order to speed up getting to a verdict.
> One of the nftables advantages is that (since rule representation in
> kernel is black-box from userspace point of view) is that the kernel
> can announce add/delete of rules or elements from nftables sets.
>
> Any particular reason why translating iptables rather than nftables
> (it should be possible to monitor the nftables changes that are
> announced by kernel and act on those)?
Yeah, correct, this should be possible as well. We started out with the
iptables part in the demo as the majority of bigger infrastructure projects
all still rely heavily on it (e.g. docker, k8s to just name two big ones).
Usually they have their requests to iptables baked into their code directly
which probably won't change any time soon, so thought was that they could
benefit initially from it once there would be sufficient coverage.
Thanks,
Daniel
Powered by blists - more mailing lists