[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240118073540.GIobmYpD@linutronix.de>
Date: Thu, 18 Jan 2024 08:35:40 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Boqun Feng <boqun.feng@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Eric Dumazet <edumazet@...gle.com>,
Frederic Weisbecker <frederic@...nel.org>,
Ingo Molnar <mingo@...hat.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Cong Wang <xiyou.wangcong@...il.com>, Hao Luo <haoluo@...gle.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Jiri Olsa <jolsa@...nel.org>, Jiri Pirko <jiri@...nulli.us>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Ronak Doshi <doshir@...are.com>, Song Liu <song@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
VMware PV-Drivers Reviewers <pv-drivers@...are.com>,
Yonghong Song <yonghong.song@...ux.dev>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP
redirect.
On 2024-01-17 17:37:29 [+0100], Toke Høiland-Jørgensen wrote:
> This is all back-of-the-envelope calculations, of course. Having some
> actual numbers to look at would be great; I don't suppose you have a
> setup where you can run xdp-bench and see how your patches affect the
> throughput?
No but I probably could set it up.
> I chatted with Jesper about this, and he had an idea not too far from
> this: split up the XDP and regular stack processing in two stages, each
> with their individual batching. So whereas right now we're doing
> something like:
>
> run_napi()
> bh_disable()
> for pkt in budget:
> act = run_xdp(pkt)
> if (act == XDP_PASS)
> run_netstack(pkt) // this is the expensive bit
> bh_enable()
>
> We could instead do:
>
> run_napi()
> bh_disable()
> for pkt in budget:
> act = run_xdp(pkt)
> if (act == XDP_PASS)
> add_to_list(pkt, to_stack_list)
> bh_enable()
> // sched point
> bh_disable()
> for pkt in to_stack_list:
> run_netstack(pkt)
> bh_enable()
>
>
> This would limit the batching that blocks everything to only the XDP
> processing itself, which should limit the maximum time spent in the
> blocking state significantly compared to what we have today. The caveat
> being that rearranging things like this is potentially a pretty major
> refactoring task that needs to touch all the drivers (even if some of
> the logic can be moved into the core code in the process). So not really
> sure if this approach is feasible, TBH.
This does not work because bh_disable() does not disable scheduling.
Scheduling may happen. bh_disable() acquires a lock which is currently
the only synchronisation point between two say network driver doing
NAPI. And this what I want to get rid of.
Regarding expensive bit as in XDP_PASS: This doesn't need locking as per
proposal, just the REDIRECT piece.
> > Daniel said netkit doesn't need this locking because it is not
> > supporting this redirect and it made me think. Would it work to make
> > the redirect structures part of the bpf_prog-structure instead of
> > per-CPU? My understanding is that eBPF's programs data structures are
> > part of it and contain locking allowing one eBPF program preempt
> > another one.
> > Having the redirect structures part of the program would obsolete
> > locking. Do I miss anything?
>
> This won't work, unfortunately: the same XDP program can be attached to
> multiple interfaces simultaneously, and for hardware with multiple
> receive queues (which is most of the hardware that supports XDP), it can
> even run simultaneously on multiple CPUs on the same interface. This is
> the reason why this is all being kept in per-CPU variables today.
So I started hacking this and noticed yesterday and noticed that you can
run multiple bpf programs. This is how I learned that it won't work.
My plan B is now to move it into task_struct.
> -Toke
Sebastian
Powered by blists - more mailing lists