lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20231218164142.0b10e29d@kernel.org> Date: Mon, 18 Dec 2023 16:41:42 -0800 From: Jakub Kicinski <kuba@...nel.org> To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Michael Chan <michael.chan@...adcom.com> Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, Boqun Feng <boqun.feng@...il.com>, Daniel Borkmann <daniel@...earbox.net>, Eric Dumazet <edumazet@...gle.com>, Frederic Weisbecker <frederic@...nel.org>, Ingo Molnar <mingo@...hat.com>, Paolo Abeni <pabeni@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>, Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org> Subject: Re: [PATCH net-next 00/24] locking: Introduce nested-BH locking. On Mon, 18 Dec 2023 18:23:31 +0100 Sebastian Andrzej Siewior wrote: > On 2023-12-15 14:50:59 [-0800], Jakub Kicinski wrote: > > On Fri, 15 Dec 2023 18:07:19 +0100 Sebastian Andrzej Siewior wrote: > > > The proposed way out is to introduce explicit per-CPU locks for > > > resources which are protected by local_bh_disable() and use those only > > > on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds. > > > > As I said at LPC, complicating drivers with odd locking constructs > > is a no go for me. > > I misunderstood it then as I assumed you wanted to ease the work while I > was done which every driver after (hopefully) understanding what is > possible/ needed and what not. We do speak here about 15++? My main concern is that it takes the complexity of writing network device drivers to a next level. It's already hard enough to implement XDP correctly. "local lock" and "guard"? Too complicated :( Or "unmaintainable" as in "too much maintainer's time will be spent reviewing code that gets this wrong". > Now. The pattern is usually > | act = bpf_prog_run_xdp(xdp_prog, &xdp); > | switch (act) { > | case XDP_REDIRECT: > | ret = xdp_do_redirect(netdev, &xdp, xdp_prog))) > | if (ret) > | goto XDP_ABORTED; > | xdp_redir++ or so; > > so we might be able to turn this into something that covers both and > returns either XDP_REDIRECT or XDP_ABORTED. So this could be merged > into > > | u32 bpf_prog_run_xdp_and_redirect(struct net_device *dev, const struct > | bpf_prog *prog, struct xdp_buff *xdp) > | { > | u32 act; > | int ret; > | > | act = bpf_prog_run_xdp(prog, xdp); > | if (act == XDP_REDIRECT) { > | ret = xdp_do_redirect(netdev, xdp, prog); > | if (ret < 0) > | act = XDP_ABORTED; > | } > | return act; > | } If we could fold the DROP case into this -- even better! > so the lock can be put inside the function and all drivers use this > function. > > From looking through drivers/net/ethernet/, this should work for most > drivers: > - amazon/ena > - aquantia/atlantic > - engleder/tsnep > - freescale/enetc > - freescale/fec > - intel/igb > - intel/igc > - marvell/mvneta > - marvell/mvpp2 > - marvell/octeontx2 > - mediatek/mtk > - mellanox/mlx5 > - microchip/lan966x > - microsoft/mana > - netronome/nfp (two call paths with no support XDP_REDIRECT) > - sfc/rx > - sfc/siena (that offset pointer can be moved) > - socionext/netsec > - stmicro/stmmac > > A few do something custom/ additionally between bpf_prog_run_xdp() and > xdp_do_redirect(): > > - broadcom/bnxt > calculates length, offset, data pointer. DMA unmaps + memory > allocations before redirect. Just looked at this one. The recalculation is probably for the PASS / TX cases, REDIRECT / DROP shouldn't care. The DMA unmap looks like a bug (hi, Michael!) > - freescale/dpaa2 > - freescale/dpaa > sets xdp.data_hard_start + frame_sz, unmaps DMA. > > - fungible/funeth > conditional redirect. > > - google/gve > Allocates a new packet for redirect. > > - intel/ixgbe > - intel/i40e > - intel/ice > Failure in the ZC case is different from XDP_ABORTED, depends on the > error from xdp_do_redirect()) > > - mellanox/mlx4/ > calculates page_offset. > > - qlogic/qede > DMA unmap and buffer alloc. > > - ti/cpsw_priv > recalculates length (pointer). > > and a few more don't support XDP_REDIRECT: > > - cavium/thunder > does not support XDP_REDIRECT, calculates length, offset. > > - intel/ixgbevf > does not support XDP_REDIRECT > > I don't understand why some driver need to recalculate data_hard_start, > length and so on and others don't. This might be only needed for the > XDP_TX case or not needed… > Also I'm not sure about the dma unmaps and skb allocations. The new skb > allocation can be probably handled before running the bpf prog but then > in the XDP_PASS case it is a waste… > And the DMA unmaps. Only a few seem to need it. Maybe it can be done > before running the BPF program. After all the bpf may look into the skb. > > > If that is no go, then the only thing that comes to mind is (as you > mentioned on LPC) to acquire the lock in bpf_prog_run_xdp() and drop it > in xdp_do_redirect(). This would require that every driver invokes > xdp_do_redirect() even not if it is not supporting it (by setting netdev > to NULL or so). To make progress on other parts of the stack we could also take the local lock around all of napi->poll() for now..
Powered by blists - more mailing lists