[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<PAXPR04MB85109A26C0636F3EF33174D188B82@PAXPR04MB8510.eurprd04.prod.outlook.com>
Date: Mon, 21 Apr 2025 06:05:22 +0000
From: Wei Fang <wei.fang@....com>
To: Vladimir Oltean <vladimir.oltean@....com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>
CC: Claudiu Manoil <claudiu.manoil@....com>, Clark Wang
<xiaoning.wang@....com>, Vlatko Markovikj <vlatko.markovikj@...s.com>, Andrew
Lunn <andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, Eric
Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann
<daniel@...earbox.net>, Jesper Dangaard Brouer <hawk@...nel.org>, John
Fastabend <john.fastabend@...il.com>, Lorenzo Bianconi <lorenzo@...nel.org>,
Toke Hoiland-Jorgensen <toke@...hat.com>, Alexander Lobakin
<aleksander.lobakin@...el.com>, "imx@...ts.linux.dev" <imx@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: RE: [PATCH net 3/3] net: enetc: fix frame corruption on
bpf_xdp_adjust_head/tail() and XDP_PASS
> Vlatko Markovikj reported that XDP programs attached to ENETC do not
> work well if they use bpf_xdp_adjust_head() or bpf_xdp_adjust_tail(),
> combined with the XDP_PASS verdict. A typical use case is to add or
> remove a VLAN tag.
>
> The resulting sk_buff passed to the stack is corrupted, because the
> algorithm used by the driver for XDP_PASS is to unwind the current
> buffer pointer in the RX ring and to re-process the current frame with
> enetc_build_skb() as if XDP hadn't run. That is incorrect because XDP
> may have modified the geometry of the buffer, which we then are
> completely unaware of. We are looking at a modified buffer with the
> original geometry.
>
> The initial reaction, both from me and from Vlatko, was to shop around
> the kernel for code to steal that would calculate a delta between the
> old and the new XDP buffer geometry, and apply that to the sk_buff too.
> We noticed that veth and generic xdp have such code.
>
> The headroom adjustment is pretty uncontroversial, but what turned out
> severely problematic is the tailroom.
>
> veth has this snippet:
>
> __skb_put(skb, off); /* positive on grow, negative on shrink */
>
> which on first sight looks decent enough, except __skb_put() takes an
> "unsigned int" for the second argument, and the arithmetic seems to only
> work correctly by coincidence. Second issue, __skb_put() contains a
> SKB_LINEAR_ASSERT(). It's not a great pattern to make more widespread.
> The skb may still be nonlinear at that point - it only becomes linear
> later when resetting skb->data_len to zero.
>
> To avoid the above, bpf_prog_run_generic_xdp() does this instead:
>
> skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
> skb->len += off; /* positive on grow, negative on shrink */
>
> which is more open-coded, uses lower-level functions and is in general a
> bit too much to spread around in driver code.
>
> Then there is the snippet:
>
> if (xdp_buff_has_frags(xdp))
> skb->data_len = skb_shinfo(skb)->xdp_frags_size;
> else
> skb->data_len = 0;
>
> One would have expected __pskb_trim() to be the function of choice for
> this task. But it's not used in veth/xdpgeneric because the extraneous
> fragments were _already_ freed by bpf_xdp_adjust_tail() ->
> bpf_xdp_frags_shrink_tail() -> ... -> __xdp_return() - the backing
> memory for the skb frags and the xdp frags is the same, but they don't
> keep individual references.
>
> In fact, that is the biggest reason why this snippet cannot be reused
> as-is, because ENETC temporarily constructs an skb with the original len
> and the original number of frags. Because the extraneous frags are
> already freed by bpf_xdp_adjust_tail() and returned to the page
> allocator, it means the entire approach of using enetc_build_skb() is
> questionable for XDP_PASS. To avoid that, one would need to elevate the
> page refcount of all frags before calling bpf_prog_run_xdp() and drop it
> after XDP_PASS.
>
> There are other things that are missing in ENETC's handling of XDP_PASS,
> like for example updating skb_shinfo(skb)->meta_len.
>
> These are all handled correctly and cleanly in commit 539c1fba1ac7
> ("xdp: add generic xdp_build_skb_from_buff()"), added to net-next in
> Dec 2024, and in addition might even be quicker that way. I have a very
> strong preference towards backporting that commit for "stable", and that
> is what is used to fix the handling bugs. It is way too messy to go
> this deep into the guts of an sk_buff from the code of a device driver.
>
> Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS")
> Reported-by: Vlatko Markovikj <vlatko.markovikj@...s.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
> drivers/net/ethernet/freescale/enetc/enetc.c | 26 +++++++++++---------
> 1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c
> b/drivers/net/ethernet/freescale/enetc/enetc.c
> index 74721995cb1f..3ee52f4b1166 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -1878,11 +1878,10 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
> *rx_ring,
>
> while (likely(rx_frm_cnt < work_limit)) {
> union enetc_rx_bd *rxbd, *orig_rxbd;
> - int orig_i, orig_cleaned_cnt;
> struct xdp_buff xdp_buff;
> struct sk_buff *skb;
> + int orig_i, err;
> u32 bd_status;
> - int err;
>
> rxbd = enetc_rxbd(rx_ring, i);
> bd_status = le32_to_cpu(rxbd->r.lstatus);
> @@ -1897,7 +1896,6 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
> *rx_ring,
> break;
>
> orig_rxbd = rxbd;
> - orig_cleaned_cnt = cleaned_cnt;
> orig_i = i;
>
> enetc_build_xdp_buff(rx_ring, bd_status, &rxbd, &i,
> @@ -1925,15 +1923,21 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
> *rx_ring,
> rx_ring->stats.xdp_drops++;
> break;
> case XDP_PASS:
> - rxbd = orig_rxbd;
> - cleaned_cnt = orig_cleaned_cnt;
> - i = orig_i;
> -
> - skb = enetc_build_skb(rx_ring, bd_status, &rxbd,
> - &i, &cleaned_cnt,
> - ENETC_RXB_DMA_SIZE_XDP);
> - if (unlikely(!skb))
> + skb = xdp_build_skb_from_buff(&xdp_buff);
> + /* Probably under memory pressure, stop NAPI */
> + if (unlikely(!skb)) {
> + enetc_xdp_drop(rx_ring, orig_i, i);
> + rx_ring->stats.xdp_drops++;
> goto out;
> + }
> +
> + enetc_get_offloads(rx_ring, orig_rxbd, skb);
> +
> + /* These buffers are about to be owned by the stack.
> + * Update our buffer cache (the rx_swbd array elements)
> + * with their other page halves.
> + */
> + enetc_bulk_flip_buff(rx_ring, orig_i, i);
>
> napi_gro_receive(napi, skb);
> break;
> --
> 2.34.1
Reviewed-by: Wei Fang <wei.fang@....com>
Powered by blists - more mailing lists