lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMB2axOUP1q9O1ViA_kzOvHDHKOYYahH=QMOvvJfffwgoYPGyA@mail.gmail.com>
Date: Mon, 15 Sep 2025 13:39:33 -0700
From: Amery Hung <ameryhung@...il.com>
To: Tariq Toukan <ttoukan.linux@...il.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, andrew+netdev@...n.ch, 
	davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com, kuba@...nel.org, 
	martin.lau@...nel.org, noren@...dia.com, dtatulea@...dia.com, 
	saeedm@...dia.com, tariqt@...dia.com, mbloch@...dia.com, cpaasch@...nai.com, 
	kernel-team@...a.com
Subject: Re: [PATCH net v1 2/2] net/mlx5e: RX, Fix generating skb from
 non-linear xdp_buff for striding RQ

On Thu, Sep 11, 2025 at 2:19 AM Tariq Toukan <ttoukan.linux@...il.com> wrote:
>
>
>
> On 10/09/2025 6:41, Amery Hung wrote:
> > XDP programs can change the layout of an xdp_buff through
> > bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver
> > cannot assume the size of the linear data area nor fragments. Fix the
> > bug in mlx5 by generating skb according to xdp_buff after XDP programs
> > run.
> >
> > Currently, when handling multi-buf XDP, the mlx5 driver assumes the
> > layout of an xdp_buff to be unchanged. That is, the linear data area
> > continues to be empty and fragments remain the same. This may cause
> > the driver to generate erroneous skb or triggering a kernel
> > warning. When an XDP program added linear data through
> > bpf_xdp_adjust_head(), the linear data will be ignored as
> > mlx5e_build_linear_skb() builds an skb without linear data and then
> > pull data from fragments to fill the linear data area. When an XDP
> > program has shrunk the non-linear data through bpf_xdp_adjust_tail(),
> > the delta passed to __pskb_pull_tail() may exceed the actual nonlinear
> > data size and trigger the BUG_ON in it.
> >
> > To fix the issue, first record the original number of fragments. If the
> > number of fragments changes after the XDP program runs, rewind the end
> > fragment pointer by the difference and recalculate the truesize. Then,
> > build the skb with the linear data area matching the xdp_buff. Finally,
> > only pull data in if there is non-linear data and fill the linear part
> > up to 256 bytes.
> >
> > Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ")
> > Signed-off-by: Amery Hung <ameryhung@...il.com>
> > ---
>
> Thanks for your patch!
>
> >   .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 21 ++++++++++++++++---
> >   1 file changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index 1d3eacfd0325..fc881d8d2d21 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -2013,6 +2013,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> >       u32 byte_cnt       = cqe_bcnt;
> >       struct skb_shared_info *sinfo;
> >       unsigned int truesize = 0;
> > +     u32 pg_consumed_bytes;
> >       struct bpf_prog *prog;
> >       struct sk_buff *skb;
> >       u32 linear_frame_sz;
> > @@ -2066,7 +2067,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> >
> >       while (byte_cnt) {
> >               /* Non-linear mode, hence non-XSK, which always uses PAGE_SIZE. */
> > -             u32 pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
> > +             pg_consumed_bytes = min_t(u32, PAGE_SIZE - frag_offset, byte_cnt);
> >
> >               if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
> >                       truesize += pg_consumed_bytes;
> > @@ -2082,10 +2083,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> >       }
> >
> >       if (prog) {
> > +             u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
> > +             u32 len;
> > +
> >               if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
> >                       if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
> >                               struct mlx5e_frag_page *pfp;
> >
> > +                             frag_page -= old_nr_frags - sinfo->nr_frags;
> > +
> >                               for (pfp = head_page; pfp < frag_page; pfp++)
> >                                       pfp->frags++;
> >
> > @@ -2096,9 +2102,16 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> >                       return NULL; /* page/packet was consumed by XDP */
> >               }
> >
> > +             nr_frags_free = old_nr_frags - sinfo->nr_frags;
> > +             frag_page -= nr_frags_free;
> > +             truesize -= ALIGN(pg_consumed_bytes, BIT(rq->mpwqe.log_stride_sz)) +
> > +                         (nr_frags_free - 1) * ALIGN(PAGE_SIZE, BIT(rq->mpwqe.log_stride_sz));
>
> This is a very complicated calculation resulting zero in the common case
> nr_frags_free == 0.
> Maybe better do it conditionally under if (nr_frags_free), together with
> 'frag_page -= nr_frags_free;' ?
>

Will change the recalculation back to conditional.

> We never use stride_size > PAGE_SIZE so the second alignment here is
> redundant.

Got it. I will remove the ALIGN for the second part.

>
> Also, what about truesize changes due to adjust header, i.e. when we
> extend the header into the linear part.
> I think 'len' calculated below is missing from truesize.

The linear part will be included later in mlx5e_build_linear_skb() ->
napi_build_skb() -> ... -> __finalize_skb_around().

> > +
> > +             len = mxbuf->xdp.data_end - mxbuf->xdp.data;
> > +
> >               skb = mlx5e_build_linear_skb(
> >                       rq, mxbuf->xdp.data_hard_start, linear_frame_sz,
> > -                     mxbuf->xdp.data - mxbuf->xdp.data_hard_start, 0,
> > +                     mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len,
> >                       mxbuf->xdp.data - mxbuf->xdp.data_meta);
> >               if (unlikely(!skb)) {
> >                       mlx5e_page_release_fragmented(rq->page_pool,
> > @@ -2123,8 +2136,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
> >                       do
> >                               pagep->frags++;
> >                       while (++pagep < frag_page);
> > +
> > +                     headlen = min_t(u16, MLX5E_RX_MAX_HEAD - len, skb->data_len);
> > +                     __pskb_pull_tail(skb, headlen);
> >               }
> > -             __pskb_pull_tail(skb, headlen);
> >       } else {
> >               dma_addr_t addr;
> >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ