netdev - Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b2985e15-336c-4aca-c0c7-b6609adf5397@iogearbox.net>
Date:   Fri, 7 Oct 2022 21:06:07 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     sdf@...gle.com,
        Toke Høiland-Jørgensen <toke@...hat.com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        bpf <bpf@...r.kernel.org>,
        Nikolay Aleksandrov <razor@...ckwall.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        John Fastabend <john.fastabend@...il.com>,
        Joanne Koong <joannelkoong@...il.com>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Joe Stringer <joe@...ium.io>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc
 BPF programs

On 10/7/22 8:11 PM, sdf@...gle.com wrote:
> On 10/07, Toke Høiland-Jørgensen wrote:
>> sdf@...gle.com writes:
[...]
>> >> I was thinking a little about how this might work; i.e., how can the
>> >> kernel expose the required knobs to allow a system policy to be
>> >> implemented without program loading having to talk to anything other
>> >> than the syscall API?
>> >
>> >> How about we only expose prepend/append in the prog attach UAPI, and
>> >> then have a kernel function that does the sorting like:
>> >
>> >> int bpf_add_new_tcx_prog(struct bpf_prog *progs, size_t num_progs, struct
>> >> bpf_prog *new_prog, bool append)
>> >
>> >> where the default implementation just appends/prepends to the array in
>> >> progs depending on the value of 'appen'.
>> >
>> >> And then use the __weak linking trick (or maybe struct_ops with a member
>> >> for TXC, another for XDP, etc?) to allow BPF to override the function
>> >> wholesale and implement whatever ordering it wants? I.e., allow it can
>> >> to just shift around the order of progs in the 'progs' array whenever a
>> >> program is loaded/unloaded?
>> >
>> >> This way, a userspace daemon can implement any policy it wants by just
>> >> attaching to that hook, and keeping things like how to express
>> >> dependencies as a userspace concern?
>> >
>> > What if we do the above, but instead of simple global 'attach first/last',
>> > the default api would be:
>> >
>> > - attach before <target_fd>
>> > - attach after <target_fd>
>> > - attach before target_fd=-1 == first
>> > - attach after target_fd=-1 == last
>> >
>> > ?
> 
>> Hmm, the problem with that is that applications don't generally have an
>> fd to another application's BPF programs; and obtaining them from an ID
>> is a privileged operation (CAP_SYS_ADMIN). We could have it be "attach
>> before target *ID*" instead, which could work I guess? But then the
>> problem becomes that it's racy: the ID you're targeting could get
>> detached before you attach, so you'll need to be prepared to check that
>> and retry; and I'm almost certain that applications won't test for this,
>> so it'll just lead to hard-to-debug heisenbugs. Or am I being too
>> pessimistic here?
> 
> Yeah, agreed, id would work better. I guess I'm mostly coming here
> from the bpftool/observability perspective where it seems handy to
> being able to stick into any place in the chain for debugging?
> 
> Not sure if we need to care about raciness here. The same thing applies
> for things like 'list all programs and dump their info' and all other
> similar rmw operations?

For such case you have the issue that kernel has source of truth and
some kind of agent would have its own view when it wants to do dependency
injection, so it needs to keep syncing with kernel view somehow (given
there can be multiple entities changing it), and then question is also
understanding context 'what is target fd/id xyz'. The latter you have as
well when you, say, tell a system daemon that it needs to install a struct_ops
program to manage these things (see remark wrt containers only host netns
being shared) - now you moved that 'prio 1 conflict' situation to a meta
level where the orchestration progs fight over each other wrt who's first.

> But, I guess, most users will still do 'attach target id = -1' aka
> 'attach last' which probably makes this flexibility unnecessary?
> OTOH, the users that still want it (bpftool/observability) might use it.

To me it sounds reasonable to have the append mode as default mode/API,
and an advanced option to say 'I want to run as 2nd prog, but if something
is already attached as 2nd prog, shift all the others +1 in the array' which
would relate to your above point, Stan, of being able to stick into any
place in the chain.

Thanks,
Daniel