[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160922092138.GA12108@salvia>
Date: Thu, 22 Sep 2016 11:21:38 +0200
From: Pablo Neira Ayuso <pablo@...filter.org>
To: Thomas Graf <tgraf@...g.ch>
Cc: Daniel Mack <daniel@...que.org>, htejun@...com,
daniel@...earbox.net, ast@...com, davem@...emloft.net,
kafai@...com, fw@...len.de, harald@...hat.com,
netdev@...r.kernel.org, sargun@...gun.me, cgroups@...r.kernel.org
Subject: Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs
On Wed, Sep 21, 2016 at 08:48:27PM +0200, Thomas Graf wrote:
> On 09/21/16 at 05:45pm, Pablo Neira Ayuso wrote:
> > On Tue, Sep 20, 2016 at 06:43:35PM +0200, Daniel Mack wrote:
> > > The point is that from an application's perspective, restricting the
> > > ability to bind a port and dropping packets that are being sent is a
> > > very different thing. Applications will start to behave differently if
> > > they can't bind to a port, and that's something we do not want to happen.
> >
> > What is exactly the problem? Applications are not checking for return
> > value from bind? They should be fixed. If you want to collect
> > statistics, I see no reason why you couldn't collect them for every
> > EACCESS on each bind() call.
>
> It's not about applications not checking the return value of bind().
> Unfortunately, many applications (or the respective libraries they use)
> retry on connect() failure but handle bind() errors as a hard failure
> and exit. Yes, it's an application or library bug but these
> applications have very specific exceptions how something fails.
> Sometimes even going from drop to RST will break applications.
>
> Paranoia speaking: by returning errors where no error was returned
> before, undefined behaviour occurs. In Murphy speak: things break.
>
> This is given and we can't fix it from the kernel side. Returning at
> system call level has many benefits but it's not always an option.
>
> Adding the late hook does not prevent filtering at socket layer to
> also be added. I think we need both.
I have a hard time to buy this new specific hook, I think we should
shift focus of this debate, this is my proposal to untangle this:
You add a net/netfilter/nft_bpf.c expression that allows you to run
bpf programs from nf_tables. This expression can either run bpf
programs in a similar fashion to tc+bpf or run the bpf program that
you have attached to the cgroup.
To achieve this, I'd suggest you also add a new bpf chain type. That
new chain type would basically provide raw access to netfilter hooks
via nf_tables netlink interface. This bpf chain would exclusively
take rules that use this new bpf expression.
I see good things on this proposal:
* This is consistent to what we offer via tc+bpf.
* It becomes easily visible to the user that a bpf program is running
from the packet path, or any cgroup+bpf filtering is going on. Thus,
no matter what those orchestrators do, this filtering becomes
visible to sysadmins that are familiar with the existing command line
tooling.
* You get access to all of the existing netfilter hooks in one go.
A side note on this: I would suggest this conversation focuses on
discussing aspects at a slightly higher level rather than counting raw
load and stores instructions... I think this effort requires looking
at the whole forest, instead barfing at one single tree. Genericity
always comes at a slight cost, and to all those programmability fans
here, please remember we have a generic stack between hands after all.
So let's try to accomodate this new requirements in a way that makes
sense.
Thanks.
Powered by blists - more mailing lists