netdev - Re: [PATCH v5 0/6] Add eBPF hooks for cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 13 Sep 2016 16:14:34 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Daniel Mack <daniel@...que.org>,
        Pablo Neira Ayuso <pablo@...filter.org>
CC:     htejun@...com, ast@...com, davem@...emloft.net, kafai@...com,
        fw@...len.de, harald@...hat.com, netdev@...r.kernel.org,
        sargun@...gun.me, cgroups@...r.kernel.org
Subject: Re: [PATCH v5 0/6] Add eBPF hooks for cgroups

On 09/13/2016 03:31 PM, Daniel Mack wrote:
> On 09/13/2016 01:56 PM, Pablo Neira Ayuso wrote:
>> On Mon, Sep 12, 2016 at 06:12:09PM +0200, Daniel Mack wrote:
>>> This is v5 of the patch set to allow eBPF programs for network
>>> filtering and accounting to be attached to cgroups, so that they apply
>>> to all sockets of all tasks placed in that cgroup. The logic also
>>> allows to be extendeded for other cgroup based eBPF logic.
>>
>> 1) This infrastructure can only be useful to systemd, or any similar
>>     orchestration daemon. Look, you can only apply filtering policies
>>     to processes that are launched by systemd, so this only works
>>     for server processes.
>
> Sorry, but both statements aren't true. The eBPF policies apply to every
> process that is placed in a cgroup, and my example program in 6/6 shows
> how that can be done from the command line. Also, systemd is able to
> control userspace processes just fine, and it not limited to 'server
> processes'.
>
>> For client processes this infrastructure is
>>     *racy*, you have to add new processes in runtime to the cgroup,
>>     thus there will be time some little time where no filtering policy
>>     will be applied. For quality of service, this may be an acceptable
>>     race, but this is aiming to deploy a filtering policy.
>
> That's a limitation that applies to many more control mechanisms in the
> kernel, and it's something that can easily be solved with fork+exec.
>
>> 2) This aproach looks uninfrastructured to me. This provides a hook
>>     to push a bpf blob at a place in the stack that deploys a filtering
>>     policy that is not visible to others.
>
> That's just as transparent as SO_ATTACH_FILTER. What kind of
> introspection mechanism do you have in mind?
>
>> We have interfaces that allows
>>     us to dump the filtering policy that is being applied, report events
>>     to enable cooperation between several processes with similar
>>     capabilities and so on.
>
> Well, in practice, for netfilter, there can only be one instance in the
> system that acts as central authoritative, otherwise you'll end up with
> orphaned entries or with situation where some client deletes rules
> behind the back of the one that originally installed it. So I really
> think there is nothing wrong with demanding a single, privileged
> controller to manage things.
>
>>> After chatting with Daniel Borkmann and Alexei off-list, we concluded
>>> that __dev_queue_xmit() is the place where the egress hooks should live
>>> when eBPF programs need access to the L2 bits of the skb.
>>
>> 3) This egress hook is coming very late, the only reason I find to
>>     place it at __dev_queue_xmit() is that bpf naturally works with
>>     layer 2 information in place. But this new hook is placed in
>>     _everyone's output ath_ that only works for the very specific
>>     usecase I exposed above.
>
> It's about filtering outgoing network packets of applications, and
> providing them with L2 information for filtering purposes. I don't think
> that's a very specific use-case.
>
> When the feature is not used at all, the added costs on the output path
> are close to zero, due to the use of static branches. If used somewhere
> in the system but not for the packet in flight, costs are slightly
> higher but acceptable. In fact, it's not even measurable in my tests
> here. How is that different from the netfilter OUTPUT hook, btw?
>
> That said, limiting it to L3 is still an option. It's just that we need
> ingress and egress to be in sync, so both would be L3 then. So far, the
> possible advantages for future use-cases having access to L2 outweighed
> the concerns of putting the hook to dev_queue_xmit(), but I'm open to
> discussing that.

While I fully disagree with Pablo's point 1) and 2), in the last set I
raised a similar concern as in point 3) wrt __dev_queue_xmit(). The set
as-is would indeed need the L2 info, since a filter could do a load via
LLVM built-ins such as asm("llvm.bpf.load.byte") et al, with BPF_LL_OFF,
where we're forced to do a load relative to skb_mac_header(). As stated
by Daniel already, it would be nice to see the full frame, so it comes
down to a trade-off, but the option of L3 onwards also exists and BPF can
work just fine with it, too. This just means it's placed in the local
output path and the verifier would need to disallow these built-ins during
bpf(2) load time. They are a rather cumbersome legacy anyway, so
bpf_skb_load_bytes() helper can be used instead, which is also easier
to use.

>> The main concern during the workshop was that a hook only for cgroups
>> is too specific, but this is actually even more specific than this.
>
> This patch set merely implements an infrastructure that can accommodate
> many more things as well in the future. We could, in theory, even add
> hooks for forwarded packets specifically, or other eBPF programs, not
> even for network filtering etc.
>
>> I have nothing against systemd or the needs for more
>> programmability/flexibility in the stack, but I think this needs to
>> fulfill some requirements to fit into the infrastructure that we have
>> in the right way.
>
> Well, as I explained already, this patch set results from endless
> discussions that went nowhere, about how such a thing can be achieved
> with netfilter.
>
>
> Thanks,
> Daniel
>