netdev - Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240118073540.GIobmYpD@linutronix.de>
Date: Thu, 18 Jan 2024 08:35:40 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Network Development <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Boqun Feng <boqun.feng@...il.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Frederic Weisbecker <frederic@...nel.org>,
	Ingo Molnar <mingo@...hat.com>, Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>,
	Alexei Starovoitov <ast@...nel.org>,
	Andrii Nakryiko <andrii@...nel.org>,
	Cong Wang <xiyou.wangcong@...il.com>, Hao Luo <haoluo@...gle.com>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Jesper Dangaard Brouer <hawk@...nel.org>,
	Jiri Olsa <jolsa@...nel.org>, Jiri Pirko <jiri@...nulli.us>,
	John Fastabend <john.fastabend@...il.com>,
	KP Singh <kpsingh@...nel.org>,
	Martin KaFai Lau <martin.lau@...ux.dev>,
	Ronak Doshi <doshir@...are.com>, Song Liu <song@...nel.org>,
	Stanislav Fomichev <sdf@...gle.com>,
	VMware PV-Drivers Reviewers <pv-drivers@...are.com>,
	Yonghong Song <yonghong.song@...ux.dev>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP
 redirect.

On 2024-01-17 17:37:29 [+0100], Toke Høiland-Jørgensen wrote:
> This is all back-of-the-envelope calculations, of course. Having some
> actual numbers to look at would be great; I don't suppose you have a
> setup where you can run xdp-bench and see how your patches affect the
> throughput?

No but I probably could set it up.

> I chatted with Jesper about this, and he had an idea not too far from
> this: split up the XDP and regular stack processing in two stages, each
> with their individual batching. So whereas right now we're doing
> something like:
> 
> run_napi()
>   bh_disable()
>   for pkt in budget:
>     act = run_xdp(pkt)
>     if (act == XDP_PASS)
>       run_netstack(pkt)  // this is the expensive bit
>   bh_enable()
> 
> We could instead do:
> 
> run_napi()
>   bh_disable()
>   for pkt in budget:
>     act = run_xdp(pkt)
>     if (act == XDP_PASS)
>       add_to_list(pkt, to_stack_list)
>   bh_enable()
>   // sched point
>   bh_disable()
>   for pkt in to_stack_list:
>     run_netstack(pkt)
>   bh_enable()
> 
> 
> This would limit the batching that blocks everything to only the XDP
> processing itself, which should limit the maximum time spent in the
> blocking state significantly compared to what we have today. The caveat
> being that rearranging things like this is potentially a pretty major
> refactoring task that needs to touch all the drivers (even if some of
> the logic can be moved into the core code in the process). So not really
> sure if this approach is feasible, TBH.

This does not work because bh_disable() does not disable scheduling.
Scheduling may happen. bh_disable() acquires a lock which is currently
the only synchronisation point between two say network driver doing
NAPI. And this what I want to get rid of.
Regarding expensive bit as in XDP_PASS: This doesn't need locking as per
proposal, just the REDIRECT piece.

> > Daniel said netkit doesn't need this locking because it is not
> > supporting this redirect and it made me think. Would it work to make
> > the redirect structures part of the bpf_prog-structure instead of
> > per-CPU? My understanding is that eBPF's programs data structures are
> > part of it and contain locking allowing one eBPF program preempt
> > another one.
> > Having the redirect structures part of the program would obsolete
> > locking. Do I miss anything?
> 
> This won't work, unfortunately: the same XDP program can be attached to
> multiple interfaces simultaneously, and for hardware with multiple
> receive queues (which is most of the hardware that supports XDP), it can
> even run simultaneously on multiple CPUs on the same interface. This is
> the reason why this is all being kept in per-CPU variables today.

So I started hacking this and noticed yesterday and noticed that you can
run multiple bpf programs. This is how I learned that it won't work.
My plan B is now to move it into task_struct. 

> -Toke
Sebastian