[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMB2axOUP1q9O1ViA_kzOvHDHKOYYahH=QMOvvJfffwgoYPGyA@mail.gmail.com>
Date: Mon, 15 Sep 2025 13:39:33 -0700
From: Amery Hung <ameryhung@...il.com>
To: Tariq Toukan <ttoukan.linux@...il.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, andrew+netdev@...n.ch,
davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com, kuba@...nel.org,
martin.lau@...nel.org, noren@...dia.com, dtatulea@...dia.com,
saeedm@...dia.com, tariqt@...dia.com, mbloch@...dia.com, cpaasch@...nai.com,
kernel-team@...a.com
Subject: Re: [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from
non-linear xdp_buff for striding RQ
On Thu, Sep 11, 2025 at 2:19 AM Tariq Toukan <ttoukan.linux@...il.com> wrote:
>
>
>
> On 10/09/2025 6:41, Amery Hung wrote:
> > XDP programs can change the layout of an xdp_buff through
> > bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver
> > cannot assume the size of the linear data area nor fragments. Fix the
> > bug in mlx5 by generating skb according to xdp_buff after XDP programs
> > run.
> >
> > Currently, when handling multi-buf XDP, the mlx5 driver assumes the
> > layout of an xdp_buff to be unchanged. That is, the linear data area
> > continues to be empty and fragments remain the same. This may cause
> > the driver to generate erroneous skb or triggering a kernel
> > warning. When an XDP program added linear data through
> > bpf_xdp_adjust_head(), the linear data will be ignored as
> > mlx5e_build_linear_skb() builds an skb without linear data and then
> > pull data from fragments to fill the linear data area. When an XDP
> > program has shrunk the non-linear data through bpf_xdp_adjust_tail(),
> > the delta passed to __pskb_pull_tail() may exceed the actual nonlinear
> > data size and trigger the BUG_ON in it.
> >
> > To fix the issue, first record the original number of fragments. If the
> > number of fragments changes after the XDP program runs, rewind the end
> > fragment pointer by the difference and recalculate the truesize. Then,
> > build the skb with the linear data area matching the xdp_buff. Finally,
> > only pull data in if there is non-linear data and fill the linear part
> > up to 256 bytes.
> >
> > Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ")
> > Signed-off-by: Amery Hung <ameryhung@...il.com>
> > ---
>
> Thanks for your patch!
>
> > .../net/ethernet/mellanox/mlx5/core/en_rx.c | 21 ++++++++++++++++---
> > 1 file changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index 1d3eacfd0325..fc881d8d2d21 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -2013,6 +2013,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> > u32 byte_cnt = cqe_bcnt;
> > struct skb_shared_info *sinfo;
> > unsigned int truesize = 0;
> > + u32 pg_consumed_bytes;
> > struct bpf_prog *prog;
> > struct sk_buff *skb;
> > u32 linear_frame_sz;
> > @@ -2066,7 +2067,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> >
> > while (byte_cnt) {
> > /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */
> > - u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
> > + pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
> >
> > if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
> > truesize += pg_consumed_bytes;
> > @@ -2082,10 +2083,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> > }
> >
> > if (prog) {
> > + u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
> > + u32 len;
> > +
> > if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
> > if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
> > struct mlx5e_frag_page *pfp;
> >
> > + frag_page -= old_nr_frags - sinfo->nr_frags;
> > +
> > for (pfp = head_page; pfp < frag_page; pfp++)
> > pfp->frags++;
> >
> > @@ -2096,9 +2102,16 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> > return NULL; /* page/packet was consumed by XDP */
> > }
> >
> > + nr_frags_free = old_nr_frags - sinfo->nr_frags;
> > + frag_page -= nr_frags_free;
> > + truesize -= ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)) +
> > + (nr_frags_free - 1) * ALIGN(PAGE_SIZE, BIT(rq->mpwqe.log_stride_sz));
>
> This is a very complicated calculation resulting zero in the common case
> nr_frags_free == 0.
> Maybe better do it conditionally under if (nr_frags_free), together with
> 'frag_page -= nr_frags_free;' ?
>
Will change the recalculation back to conditional.
> We never use stride_size > PAGE_SIZE so the second alignment here is
> redundant.
Got it. I will remove the ALIGN for the second part.
>
> Also, what about truesize changes due to adjust header, i.e. when we
> extend the header into the linear part.
> I think 'len' calculated below is missing from truesize.
The linear part will be included later in mlx5e_build_linear_skb() ->
napi_build_skb() -> ... -> __finalize_skb_around().
> > +
> > + len = mxbuf->xdp.data_end - mxbuf->xdp.data;
> > +
> > skb = mlx5e_build_linear_skb(
> > rq, mxbuf->xdp.data_hard_start, linear_frame_sz,
> > - mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0,
> > + mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len,
> > mxbuf->xdp.data - mxbuf->xdp.data_meta);
> > if (unlikely(!skb)) {
> > mlx5e_page_release_fragmented(rq->page_pool,
> > @@ -2123,8 +2136,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> > do
> > pagep->frags++;
> > while (++pagep < frag_page);
> > +
> > + headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, skb->data_len);
> > + __pskb_pull_tail(skb, headlen);
> > }
> > - __pskb_pull_tail(skb, headlen);
> > } else {
> > dma_addr_t addr;
> >
>
Powered by blists - more mailing lists