lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <Y0BaBUWeTj18V5Xp@google.com> Date: Fri, 7 Oct 2022 09:55:33 -0700 From: sdf@...gle.com To: "Toke Høiland-Jørgensen" <toke@...hat.com> Cc: Daniel Borkmann <daniel@...earbox.net>, Alexei Starovoitov <alexei.starovoitov@...il.com>, bpf <bpf@...r.kernel.org>, Nikolay Aleksandrov <razor@...ckwall.org>, Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, John Fastabend <john.fastabend@...il.com>, Joanne Koong <joannelkoong@...il.com>, Kumar Kartikeya Dwivedi <memxor@...il.com>, Joe Stringer <joe@...ium.io>, Network Development <netdev@...r.kernel.org> Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs On 10/07, Toke H�iland-J�rgensen wrote: > Daniel Borkmann <daniel@...earbox.net> writes: > > On 10/7/22 1:28 AM, Alexei Starovoitov wrote: > >> On Thu, Oct 6, 2022 at 2:29 PM Daniel Borkmann <daniel@...earbox.net> > wrote: > >>> On 10/6/22 7:00 AM, Alexei Starovoitov wrote: > >>>> On Wed, Oct 05, 2022 at 01:11:34AM +0200, Daniel Borkmann wrote: > >>> [...] > >>>> > >>>> I cannot help but feel that prio logic copy-paste from old tc, > netfilter and friends > >>>> is done because "that's how things were done in the past". > >>>> imo it was a well intentioned mistake and all networking things (tc, > netfilter, etc) > >>>> copy-pasted that cumbersome and hard to use concept. > >>>> Let's throw away that baggage? > >>>> In good set of cases the bpf prog inserter cares whether the prog is > first or not. > >>>> Since the first prog returning anything but TC_NEXT will be final. > >>>> I think prog insertion flags: 'I want to run first' vs 'I don't care > about order' > >>>> is good enough in practice. Any complex scheme should probably be > programmable > >>>> as any policy should. For example in Meta we have 'xdp chainer' > logic that is similar > >>>> to libxdp chaining, but we added a feature that allows a prog to > jump over another > >>>> prog and continue the chain. Priority concept cannot express that. > >>>> Since we'd have to add some "policy program" anyway for use cases > like this > >>>> let's keep things as simple as possible? > >>>> Then maybe we can adopt this "as-simple-as-possible" to XDP hooks ? > >>>> And allow bpf progs chaining in the kernel with "run_me_first" > vs "run_me_anywhere" > >>>> in both tcx and xdp ? > >>>> Naturally "run_me_first" prog will be the only one. No need for > F_REPLACE flags, etc. > >>>> The owner of "run_me_first" will update its prog through > bpf_link_update. > >>>> "run_me_anywhere" will add to the end of the chain. > >>>> In XDP for compatibility reasons "run_me_first" will be the default. > >>>> Since only one prog can be enqueued with such flag it will match > existing single prog behavior. > >>>> Well behaving progs will use (like xdp-tcpdump or monitoring progs) > will use "run_me_anywhere". > >>>> I know it's far from covering plenty of cases that we've discussed > for long time, > >>>> but prio concept isn't really covering them either. > >>>> We've struggled enough with single xdp prog, so certainly not > advocating for that. > >>>> Another alternative is to do: "queue_at_head" vs "queue_at_tail". > Just as simple. > >>>> Both simple versions have their pros and cons and don't cover > everything, > >>>> but imo both are better than prio. > >>> > >>> Yeah, it's kind of tricky, imho. The 'run_me_first' > vs 'run_me_anywhere' are two > >>> use cases that should be covered (and actually we kind of do this in > this set, too, > >>> with the prios via prio=x vs prio=0). Given users will only be > consuming the APIs > >>> via libs like libbpf, this can also be abstracted this way w/o users > having to be > >>> aware of prios. > >> > >> but the patchset tells different story. > >> Prio gets exposed everywhere in uapi all the way to bpftool > >> when it's right there for users to understand. > >> And that's the main problem with it. > >> The user don't want to and don't need to be aware of it, > >> but uapi forces them to pick the priority. > >> > >>> Anyway, where it gets tricky would be when things depend on ordering, > >>> e.g. you have BPF progs doing: policy, monitoring, lb, monitoring, > encryption, which > >>> would be sth you can build today via tc BPF: so policy one acts as a > prefilter for > >>> various cidr ranges that should be blocked no matter what, then > monitoring to sample > >>> what goes into the lb, then lb itself which does snat/dnat, then > monitoring to see what > >>> the corresponding pkt looks that goes to backend, and maybe > encryption to e.g. send > >>> the result to wireguard dev, so it's encrypted from lb node to > backend. > >> > >> That's all theory. Your cover letter example proves that in > >> real life different service pick the same priority. > >> They simply don't know any better. > >> prio is an unnecessary magic that apps _have_ to pick, > >> so they just copy-paste and everyone ends up using the same. > >> > >>> For such > >>> example, you'd need prios as the 'run_me_anywhere' doesn't guarantee > order, so there's > >>> a case for both scenarios (concrete layout vs loose one), and for > latter we could > >>> start off with and internal prio around x (e.g. 16k), so there's room > to attach in > >>> front via fixed prio, but also append to end for 'don't care', and > that could be > >>> from lib pov the default/main API whereas prio would be some kind of > extended one. > >>> Thoughts? > >> > >> If prio was not part of uapi, like kernel internal somehow, > >> and there was a user space daemon, systemd, or another bpf prog, > >> module, whatever that users would interface to then > >> the proposed implementation of prio would totally make sense. > >> prio as uapi is not that. > > > > A good analogy to this issue might be systemd's unit files.. you > specify dependencies > > for your own <unit> file via 'Wants=<unitA>', and ordering > via 'Before=<unitB>' and > > 'After=<unitC>' and they refer to other unit files. I think that is > generally okay, > > you don't deal with prio numbers, but rather some kind textual > representation. However > > user/operator will have to deal with dependencies/ordering one way or > another, the > > problem here is that we deal with kernel and loader talks to kernel > directly so it > > has no awareness of what else is running or could be running, so apps > needs to deal > > with it somehow (and it cannot without external help). > I was thinking a little about how this might work; i.e., how can the > kernel expose the required knobs to allow a system policy to be > implemented without program loading having to talk to anything other > than the syscall API? > How about we only expose prepend/append in the prog attach UAPI, and > then have a kernel function that does the sorting like: > int bpf_add_new_tcx_prog(struct bpf_prog *progs, size_t num_progs, struct > bpf_prog *new_prog, bool append) > where the default implementation just appends/prepends to the array in > progs depending on the value of 'appen'. > And then use the __weak linking trick (or maybe struct_ops with a member > for TXC, another for XDP, etc?) to allow BPF to override the function > wholesale and implement whatever ordering it wants? I.e., allow it can > to just shift around the order of progs in the 'progs' array whenever a > program is loaded/unloaded? > This way, a userspace daemon can implement any policy it wants by just > attaching to that hook, and keeping things like how to express > dependencies as a userspace concern? What if we do the above, but instead of simple global 'attach first/last', the default api would be: - attach before <target_fd> - attach after <target_fd> - attach before target_fd=-1 == first - attach after target_fd=-1 == last ? That might be flexible enough by default to allow users to append/prepend to any existing program in the chain (say, for monitoring). Flexible enough for some central daemons to do systemd-style policy. And, with bpf_add_new_tcx_prog, flexible enough to implement any policy?
Powered by blists - more mailing lists