lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <PAXPR04MB85109A26C0636F3EF33174D188B82@PAXPR04MB8510.eurprd04.prod.outlook.com>
Date: Mon, 21 Apr 2025 06:05:22 +0000
From: Wei Fang <wei.fang@....com>
To: Vladimir Oltean <vladimir.oltean@....com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>
CC: Claudiu Manoil <claudiu.manoil@....com>, Clark Wang
	<xiaoning.wang@....com>, Vlatko Markovikj <vlatko.markovikj@...s.com>, Andrew
 Lunn <andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, Eric
 Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
	<pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann
	<daniel@...earbox.net>, Jesper Dangaard Brouer <hawk@...nel.org>, John
 Fastabend <john.fastabend@...il.com>, Lorenzo Bianconi <lorenzo@...nel.org>,
	Toke Hoiland-Jorgensen <toke@...hat.com>, Alexander Lobakin
	<aleksander.lobakin@...el.com>, "imx@...ts.linux.dev" <imx@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: RE: [PATCH net 3/3] net: enetc: fix frame corruption on
 bpf_xdp_adjust_head/tail() and XDP_PASS

> Vlatko Markovikj reported that XDP programs attached to ENETC do not
> work well if they use bpf_xdp_adjust_head() or bpf_xdp_adjust_tail(),
> combined with the XDP_PASS verdict. A typical use case is to add or
> remove a VLAN tag.
> 
> The resulting sk_buff passed to the stack is corrupted, because the
> algorithm used by the driver for XDP_PASS is to unwind the current
> buffer pointer in the RX ring and to re-process the current frame with
> enetc_build_skb() as if XDP hadn't run. That is incorrect because XDP
> may have modified the geometry of the buffer, which we then are
> completely unaware of. We are looking at a modified buffer with the
> original geometry.
> 
> The initial reaction, both from me and from Vlatko, was to shop around
> the kernel for code to steal that would calculate a delta between the
> old and the new XDP buffer geometry, and apply that to the sk_buff too.
> We noticed that veth and generic xdp have such code.
> 
> The headroom adjustment is pretty uncontroversial, but what turned out
> severely problematic is the tailroom.
> 
> veth has this snippet:
> 
> 		__skb_put(skb, off); /* positive on grow, negative on shrink */
> 
> which on first sight looks decent enough, except __skb_put() takes an
> "unsigned int" for the second argument, and the arithmetic seems to only
> work correctly by coincidence. Second issue, __skb_put() contains a
> SKB_LINEAR_ASSERT(). It's not a great pattern to make more widespread.
> The skb may still be nonlinear at that point - it only becomes linear
> later when resetting skb->data_len to zero.
> 
> To avoid the above, bpf_prog_run_generic_xdp() does this instead:
> 
> 		skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
> 		skb->len += off; /* positive on grow, negative on shrink */
> 
> which is more open-coded, uses lower-level functions and is in general a
> bit too much to spread around in driver code.
> 
> Then there is the snippet:
> 
> 	if (xdp_buff_has_frags(xdp))
> 		skb->data_len = skb_shinfo(skb)->xdp_frags_size;
> 	else
> 		skb->data_len = 0;
> 
> One would have expected __pskb_trim() to be the function of choice for
> this task. But it's not used in veth/xdpgeneric because the extraneous
> fragments were _already_ freed by bpf_xdp_adjust_tail() ->
> bpf_xdp_frags_shrink_tail() -> ... -> __xdp_return() - the backing
> memory for the skb frags and the xdp frags is the same, but they don't
> keep individual references.
> 
> In fact, that is the biggest reason why this snippet cannot be reused
> as-is, because ENETC temporarily constructs an skb with the original len
> and the original number of frags. Because the extraneous frags are
> already freed by bpf_xdp_adjust_tail() and returned to the page
> allocator, it means the entire approach of using enetc_build_skb() is
> questionable for XDP_PASS. To avoid that, one would need to elevate the
> page refcount of all frags before calling bpf_prog_run_xdp() and drop it
> after XDP_PASS.
> 
> There are other things that are missing in ENETC's handling of XDP_PASS,
> like for example updating skb_shinfo(skb)->meta_len.
> 
> These are all handled correctly and cleanly in commit 539c1fba1ac7
> ("xdp: add generic xdp_build_skb_from_buff()"), added to net-next in
> Dec 2024, and in addition might even be quicker that way. I have a very
> strong preference towards backporting that commit for "stable", and that
> is what is used to fix the handling bugs. It is way too messy to go
> this deep into the guts of an sk_buff from the code of a device driver.
> 
> Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS")
> Reported-by: Vlatko Markovikj <vlatko.markovikj@...s.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
>  drivers/net/ethernet/freescale/enetc/enetc.c | 26 +++++++++++---------
>  1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c
> b/drivers/net/ethernet/freescale/enetc/enetc.c
> index 74721995cb1f..3ee52f4b1166 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -1878,11 +1878,10 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
> *rx_ring,
> 
>  	while (likely(rx_frm_cnt < work_limit)) {
>  		union enetc_rx_bd *rxbd, *orig_rxbd;
> -		int orig_i, orig_cleaned_cnt;
>  		struct xdp_buff xdp_buff;
>  		struct sk_buff *skb;
> +		int orig_i, err;
>  		u32 bd_status;
> -		int err;
> 
>  		rxbd = enetc_rxbd(rx_ring, i);
>  		bd_status = le32_to_cpu(rxbd->r.lstatus);
> @@ -1897,7 +1896,6 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
> *rx_ring,
>  			break;
> 
>  		orig_rxbd = rxbd;
> -		orig_cleaned_cnt = cleaned_cnt;
>  		orig_i = i;
> 
>  		enetc_build_xdp_buff(rx_ring, bd_status, &rxbd, &i,
> @@ -1925,15 +1923,21 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
> *rx_ring,
>  			rx_ring->stats.xdp_drops++;
>  			break;
>  		case XDP_PASS:
> -			rxbd = orig_rxbd;
> -			cleaned_cnt = orig_cleaned_cnt;
> -			i = orig_i;
> -
> -			skb = enetc_build_skb(rx_ring, bd_status, &rxbd,
> -					      &i, &cleaned_cnt,
> -					      ENETC_RXB_DMA_SIZE_XDP);
> -			if (unlikely(!skb))
> +			skb = xdp_build_skb_from_buff(&xdp_buff);
> +			/* Probably under memory pressure, stop NAPI */
> +			if (unlikely(!skb)) {
> +				enetc_xdp_drop(rx_ring, orig_i, i);
> +				rx_ring->stats.xdp_drops++;
>  				goto out;
> +			}
> +
> +			enetc_get_offloads(rx_ring, orig_rxbd, skb);
> +
> +			/* These buffers are about to be owned by the stack.
> +			 * Update our buffer cache (the rx_swbd array elements)
> +			 * with their other page halves.
> +			 */
> +			enetc_bulk_flip_buff(rx_ring, orig_i, i);
> 
>  			napi_gro_receive(napi, skb);
>  			break;
> --
> 2.34.1

Reviewed-by: Wei Fang <wei.fang@....com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ