lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231218164142.0b10e29d@kernel.org>
Date: Mon, 18 Dec 2023 16:41:42 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Michael Chan
 <michael.chan@...adcom.com>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org, "David S. Miller"
 <davem@...emloft.net>, Boqun Feng <boqun.feng@...il.com>, Daniel Borkmann
 <daniel@...earbox.net>, Eric Dumazet <edumazet@...gle.com>, Frederic
 Weisbecker <frederic@...nel.org>, Ingo Molnar <mingo@...hat.com>, Paolo
 Abeni <pabeni@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Thomas
 Gleixner <tglx@...utronix.de>, Waiman Long <longman@...hat.com>, Will
 Deacon <will@...nel.org>
Subject: Re: [PATCH net-next 00/24] locking: Introduce nested-BH locking.

On Mon, 18 Dec 2023 18:23:31 +0100 Sebastian Andrzej Siewior wrote:
> On 2023-12-15 14:50:59 [-0800], Jakub Kicinski wrote:
> > On Fri, 15 Dec 2023 18:07:19 +0100 Sebastian Andrzej Siewior wrote:  
> > > The proposed way out is to introduce explicit per-CPU locks for
> > > resources which are protected by local_bh_disable() and use those only
> > > on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds.  
> > 
> > As I said at LPC, complicating drivers with odd locking constructs
> > is a no go for me.  
> 
> I misunderstood it then as I assumed you wanted to ease the work while I
> was done which every driver after (hopefully) understanding what is
> possible/ needed and what not. We do speak here about 15++?

My main concern is that it takes the complexity of writing network
device drivers to a next level. It's already hard enough to implement
XDP correctly. "local lock" and "guard"? Too complicated :(
Or "unmaintainable" as in "too much maintainer's time will be spent
reviewing code that gets this wrong".

> Now. The pattern is usually
> |	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> |	switch (act) {
> |	case XDP_REDIRECT:
> |		ret = xdp_do_redirect(netdev, &xdp, xdp_prog)))
> |		if (ret)
> |			goto XDP_ABORTED;
> |		xdp_redir++ or so;
> 
> so we might be able to turn this into something that covers both and
> returns either XDP_REDIRECT or XDP_ABORTED. So this could be merged
> into
> 
> | u32 bpf_prog_run_xdp_and_redirect(struct net_device *dev, const struct
> | 				  bpf_prog *prog, struct xdp_buff *xdp)
> | {
> | 	u32 act;
> | 	int ret;
> | 
> | 	act = bpf_prog_run_xdp(prog, xdp);
> | 	if (act == XDP_REDIRECT) {
> | 		ret = xdp_do_redirect(netdev, xdp, prog);
> | 		if (ret < 0)
> | 			act = XDP_ABORTED;
> | 	}
> | 	return act;
> | }

If we could fold the DROP case into this -- even better!

> so the lock can be put inside the function and all drivers use this
> function.
> 
> From looking through drivers/net/ethernet/, this should work for most
> drivers:
> - amazon/ena
> - aquantia/atlantic
> - engleder/tsnep
> - freescale/enetc
> - freescale/fec
> - intel/igb
> - intel/igc
> - marvell/mvneta
> - marvell/mvpp2
> - marvell/octeontx2
> - mediatek/mtk
> - mellanox/mlx5
> - microchip/lan966x
> - microsoft/mana
> - netronome/nfp (two call paths with no support XDP_REDIRECT)
> - sfc/rx
> - sfc/siena (that offset pointer can be moved)
> - socionext/netsec
> - stmicro/stmmac
> 
> A few do something custom/ additionally between bpf_prog_run_xdp() and
>   xdp_do_redirect():
> 
> - broadcom/bnxt
>   calculates length, offset, data pointer. DMA unmaps + memory
>   allocations before redirect.

Just looked at this one. The recalculation is probably for the PASS /
TX cases, REDIRECT / DROP shouldn't care. The DMA unmap looks like 
a bug (hi, Michael!)

> - freescale/dpaa2
> - freescale/dpaa
>   sets xdp.data_hard_start + frame_sz, unmaps DMA.
> 
> - fungible/funeth
>   conditional redirect.
> 
> - google/gve
>   Allocates a new packet for redirect.
> 
> - intel/ixgbe
> - intel/i40e
> - intel/ice
>   Failure in the ZC case is different from XDP_ABORTED, depends on the
>   error from xdp_do_redirect())
> 
> - mellanox/mlx4/
>   calculates page_offset.
> 
> - qlogic/qede
>   DMA unmap and buffer alloc.
> 
> - ti/cpsw_priv
>   recalculates length (pointer).
> 
> and a few more don't support XDP_REDIRECT:
> 
> - cavium/thunder
>   does not support XDP_REDIRECT, calculates length, offset.
> 
> - intel/ixgbevf
>   does not support XDP_REDIRECT
> 
> I don't understand why some driver need to recalculate data_hard_start,
> length and so on and others don't. This might be only needed for the
> XDP_TX case or not needed…
> Also I'm not sure about the dma unmaps and skb allocations. The new skb
> allocation can be probably handled before running the bpf prog but then
> in the XDP_PASS case it is a waste…
> And the DMA unmaps. Only a few seem to need it. Maybe it can be done
> before running the BPF program. After all the bpf may look into the skb.
> 
> 
> If that is no go, then the only thing that comes to mind is (as you
> mentioned on LPC) to acquire the lock in bpf_prog_run_xdp() and drop it
> in xdp_do_redirect(). This would require that every driver invokes
> xdp_do_redirect() even not if it is not supporting it (by setting netdev
> to NULL or so).

To make progress on other parts of the stack we could also take 
the local lock around all of napi->poll() for now..

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ