netdev - Re: [PATCH bpf-next v2 2/7] bpf: Add fd-based tcx multi-prog infra with link support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=SeFagzNMWLHqM7LRXt71pWz7BJax_4rvCnLyARDyWig@mail.gmail.com>
Date: Tue, 4 Jul 2023 17:36:42 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>, ast@...nel.org, andrii@...nel.org, 
	martin.lau@...ux.dev, razor@...ckwall.org, sdf@...gle.com, 
	john.fastabend@...il.com, kuba@...nel.org, dxu@...uu.xyz, joe@...ium.io, 
	toke@...nel.org, davem@...emloft.net, bpf@...r.kernel.org, 
	netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v2 2/7] bpf: Add fd-based tcx multi-prog infra
 with link support

Sorry for the late reply, but trying this out now - and have a question:

On Thu, Jun 8, 2023 at 5:25 PM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Thu, Jun 8, 2023 at 12:46 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >
> > Hi Daniel,
> >
> > On Thu, Jun 8, 2023 at 6:12 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> > >
> > > Hi Jamal,
> > >
> > > On 6/8/23 3:25 AM, Jamal Hadi Salim wrote:
> > > [...]
> > > > A general question (which i think i asked last time as well): who
> > > > decides what comes after/before what prog in this setup? And would
> > > > that same entity not have been able to make the same decision using tc
> > > > priorities?
> > >
> > > Back in the first version of the series I initially coded up this option
> > > that the tc_run() would basically be a fake 'bpf_prog' and it would have,
> > > say, fixed prio 1000. It would get executed via tcx_run() when iterating
> > > via bpf_mprog_foreach_prog() where bpf_prog_run() is called, and then users
> > > could pick for native BPF prio before or after that. But then the feedback
> > > was that sticking to prio is a bad user experience which led to the
> > > development of what is in patch 1 of this series (see the details there).
> > >
> >
> > Thanks. I read the commit message in patch 1 and followed the thread
> > back including some of the discussion we had and i am still
> > disagreeing that this couldnt be solved with a smart priority based
> > scheme - but i think we can move on since this is standalone and
> > doesnt affect tc.
> >
> > Daniel - i am still curious in the new scheme of things how would
> > cilium vs datadog food fight get resolved without some arbitration
> > entity?
> >
> > > > The idea of protecting programs from being unloaded is very welcome
> > > > but feels would have made sense to be a separate patchset (we have
> > > > good need for it). Would it be possible to use that feature in tc and
> > > > xdp?
> > > BPF links are supported for XDP today, just tc BPF is one of the few
> > > remainders where it is not the case, hence the work of this series. What
> > > XDP lacks today however is multi-prog support. With the bpf_mprog concept
> > > that could be addressed with that common/uniform api (and Andrii expressed
> > > interest in integrating this also for cgroup progs), so yes, various hook
> > > points/program types could benefit from it.
> >
> > Is there some sample XDP related i could look at?  Let me describe our
> > use case: lets say we load an ebpf program foo attached to XDP of a
> > netdev  and then something further upstream in the stack is consuming
> > the results of that ebpf XDP program. For some reason someone, at some
> > point, decides to replace the XDP prog with a different one - and the
> > new prog does a very different thing. Could we stop the replacement
> > with the link mechanism you describe? i.e the program is still loaded
> > but is no longer attached to the netdev.
>
> If you initially attached an XDP program using BPF link api
> (LINK_CREATE command in bpf() syscall), then subsequent attachment to
> the same interface (of a new link or program with BPF_PROG_ATTACH)
> will fail until the current BPF link is detached through closing its
> last fd.
>

So this works as advertised. The problem is however not totally solved
because it seems we need a process that's alive to hold the ownership.
If we had a daemon then that would solve it i think (we dont).
Alternatively,  you pin the link. The pinning part can be
circumvented, unless i misunderstood i,e anybody with the right
permissions can remove it.

Am I missing something?

cheers,
jamal

cheers,
jamal

> That is, until we allow multiple attachments of XDP programs to the
> same network interface. But even then, no one will be able to
> accidentally replace attached link, unless they have that link FD and
> replace underlying BPF program.
>
> >
> >
> > > >> +struct tcx_entry {
> > > >> +       struct bpf_mprog_bundle         bundle;
> > > >> +       struct mini_Qdisc __rcu         *miniq;
> > > >> +};
> > > >> +
> > > >
> > > > Can you please move miniq to the front? From where i sit this looks:
> > > > struct tcx_entry {
> > > >          struct bpf_mprog_bundle    bundle
> > > > __attribute__((__aligned__(64))); /*     0  3264 */
> > > >
> > > >          /* XXX last struct has 36 bytes of padding */
> > > >
> > > >          /* --- cacheline 51 boundary (3264 bytes) --- */
> > > >          struct mini_Qdisc *        miniq;                /*  3264     8 */
> > > >
> > > >          /* size: 3328, cachelines: 52, members: 2 */
> > > >          /* padding: 56 */
> > > >          /* paddings: 1, sum paddings: 36 */
> > > >          /* forced alignments: 1 */
> > > > } __attribute__((__aligned__(64)));
> > > >
> > > > That is a _lot_ of cachelines - at the expense of the status quo
> > > > clsact/ingress qdiscs which access miniq.
> > >
> > > Ah yes, I'll fix this up.
> >
> > Thanks.
> >
> > cheers,
> > jamal
> > > Thanks,
> > > Daniel