[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231218164142.0b10e29d@kernel.org>
Date: Mon, 18 Dec 2023 16:41:42 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Michael Chan
<michael.chan@...adcom.com>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org, "David S. Miller"
<davem@...emloft.net>, Boqun Feng <boqun.feng@...il.com>, Daniel Borkmann
<daniel@...earbox.net>, Eric Dumazet <edumazet@...gle.com>, Frederic
Weisbecker <frederic@...nel.org>, Ingo Molnar <mingo@...hat.com>, Paolo
Abeni <pabeni@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Thomas
Gleixner <tglx@...utronix.de>, Waiman Long <longman@...hat.com>, Will
Deacon <will@...nel.org>
Subject: Re: [PATCH net-next 00/24] locking: Introduce nested-BH locking.
On Mon, 18 Dec 2023 18:23:31 +0100 Sebastian Andrzej Siewior wrote:
> On 2023-12-15 14:50:59 [-0800], Jakub Kicinski wrote:
> > On Fri, 15 Dec 2023 18:07:19 +0100 Sebastian Andrzej Siewior wrote:
> > > The proposed way out is to introduce explicit per-CPU locks for
> > > resources which are protected by local_bh_disable() and use those only
> > > on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds.
> >
> > As I said at LPC, complicating drivers with odd locking constructs
> > is a no go for me.
>
> I misunderstood it then as I assumed you wanted to ease the work while I
> was done which every driver after (hopefully) understanding what is
> possible/ needed and what not. We do speak here about 15++?
My main concern is that it takes the complexity of writing network
device drivers to a next level. It's already hard enough to implement
XDP correctly. "local lock" and "guard"? Too complicated :(
Or "unmaintainable" as in "too much maintainer's time will be spent
reviewing code that gets this wrong".
> Now. The pattern is usually
> | act = bpf_prog_run_xdp(xdp_prog, &xdp);
> | switch (act) {
> | case XDP_REDIRECT:
> | ret = xdp_do_redirect(netdev, &xdp, xdp_prog)))
> | if (ret)
> | goto XDP_ABORTED;
> | xdp_redir++ or so;
>
> so we might be able to turn this into something that covers both and
> returns either XDP_REDIRECT or XDP_ABORTED. So this could be merged
> into
>
> | u32 bpf_prog_run_xdp_and_redirect(struct net_device *dev, const struct
> | bpf_prog *prog, struct xdp_buff *xdp)
> | {
> | u32 act;
> | int ret;
> |
> | act = bpf_prog_run_xdp(prog, xdp);
> | if (act == XDP_REDIRECT) {
> | ret = xdp_do_redirect(netdev, xdp, prog);
> | if (ret < 0)
> | act = XDP_ABORTED;
> | }
> | return act;
> | }
If we could fold the DROP case into this -- even better!
> so the lock can be put inside the function and all drivers use this
> function.
>
> From looking through drivers/net/ethernet/, this should work for most
> drivers:
> - amazon/ena
> - aquantia/atlantic
> - engleder/tsnep
> - freescale/enetc
> - freescale/fec
> - intel/igb
> - intel/igc
> - marvell/mvneta
> - marvell/mvpp2
> - marvell/octeontx2
> - mediatek/mtk
> - mellanox/mlx5
> - microchip/lan966x
> - microsoft/mana
> - netronome/nfp (two call paths with no support XDP_REDIRECT)
> - sfc/rx
> - sfc/siena (that offset pointer can be moved)
> - socionext/netsec
> - stmicro/stmmac
>
> A few do something custom/ additionally between bpf_prog_run_xdp() and
> xdp_do_redirect():
>
> - broadcom/bnxt
> calculates length, offset, data pointer. DMA unmaps + memory
> allocations before redirect.
Just looked at this one. The recalculation is probably for the PASS /
TX cases, REDIRECT / DROP shouldn't care. The DMA unmap looks like
a bug (hi, Michael!)
> - freescale/dpaa2
> - freescale/dpaa
> sets xdp.data_hard_start + frame_sz, unmaps DMA.
>
> - fungible/funeth
> conditional redirect.
>
> - google/gve
> Allocates a new packet for redirect.
>
> - intel/ixgbe
> - intel/i40e
> - intel/ice
> Failure in the ZC case is different from XDP_ABORTED, depends on the
> error from xdp_do_redirect())
>
> - mellanox/mlx4/
> calculates page_offset.
>
> - qlogic/qede
> DMA unmap and buffer alloc.
>
> - ti/cpsw_priv
> recalculates length (pointer).
>
> and a few more don't support XDP_REDIRECT:
>
> - cavium/thunder
> does not support XDP_REDIRECT, calculates length, offset.
>
> - intel/ixgbevf
> does not support XDP_REDIRECT
>
> I don't understand why some driver need to recalculate data_hard_start,
> length and so on and others don't. This might be only needed for the
> XDP_TX case or not needed…
> Also I'm not sure about the dma unmaps and skb allocations. The new skb
> allocation can be probably handled before running the bpf prog but then
> in the XDP_PASS case it is a waste…
> And the DMA unmaps. Only a few seem to need it. Maybe it can be done
> before running the BPF program. After all the bpf may look into the skb.
>
>
> If that is no go, then the only thing that comes to mind is (as you
> mentioned on LPC) to acquire the lock in bpf_prog_run_xdp() and drop it
> in xdp_do_redirect(). This would require that every driver invokes
> xdp_do_redirect() even not if it is not supporting it (by setting netdev
> to NULL or so).
To make progress on other parts of the stack we could also take
the local lock around all of napi->poll() for now..
Powered by blists - more mailing lists