lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMB2axPifKfJxOgoB0m_WejugTKw8yJ1Zu+w0vo+Sz5c4FvNug@mail.gmail.com>
Date: Thu, 18 Sep 2025 13:19:04 -0700
From: Amery Hung <ameryhung@...il.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org, alexei.starovoitov@...il.com, 
	andrii@...nel.org, daniel@...earbox.net, paul.chaignon@...il.com, 
	kuba@...nel.org, stfomichev@...il.com, martin.lau@...nel.org, 
	mohsin.bashr@...il.com, noren@...dia.com, dtatulea@...dia.com, 
	saeedm@...dia.com, tariqt@...dia.com, mbloch@...dia.com, kernel-team@...a.com
Subject: Re: [PATCH bpf-next v4 2/6] bpf: Support pulling non-linear xdp data

On Thu, Sep 18, 2025 at 10:56 AM Amery Hung <ameryhung@...il.com> wrote:
>
> On Thu, Sep 18, 2025 at 2:11 AM Maciej Fijalkowski
> <maciej.fijalkowski@...el.com> wrote:
> >
> > On Wed, Sep 17, 2025 at 03:55:09PM -0700, Amery Hung wrote:
> > > Add kfunc, bpf_xdp_pull_data(), to support pulling data from xdp
> > > fragments. Similar to bpf_skb_pull_data(), bpf_xdp_pull_data() makes
> > > the first len bytes of data directly readable and writable in bpf
> > > programs. If the "len" argument is larger than the linear data size,
> > > data in fragments will be copied to the linear data area when there
> > > is enough room. Specifically, the kfunc will try to use the tailroom
> > > first. When the tailroom is not enough, metadata and data will be
> > > shifted down to make room for pulling data.
> > >
> > > A use case of the kfunc is to decapsulate headers residing in xdp
> > > fragments. It is possible for a NIC driver to place headers in xdp
> > > fragments. To keep using direct packet access for parsing and
> > > decapsulating headers, users can pull headers into the linear data
> > > area by calling bpf_xdp_pull_data() and then pop the header with
> > > bpf_xdp_adjust_head().
> > >
> > > Reviewed-by: Jakub Kicinski <kuba@...nel.org>
> > > Signed-off-by: Amery Hung <ameryhung@...il.com>
> > > ---
> > >  net/core/filter.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 91 insertions(+)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 0b82cb348ce0..0e8d63bf1d30 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -12212,6 +12212,96 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
> > >       return 0;
> > >  }
> > >
> > > +/**
> > > + * bpf_xdp_pull_data() - Pull in non-linear xdp data.
> > > + * @x: &xdp_md associated with the XDP buffer
> > > + * @len: length of data to be made directly accessible in the linear part
> > > + *
> > > + * Pull in data in case the XDP buffer associated with @x is non-linear and
> > > + * not all @len are in the linear data area.
> > > + *
> > > + * Direct packet access allows reading and writing linear XDP data through
> > > + * packet pointers (i.e., &xdp_md->data + offsets). The amount of data which
> > > + * ends up in the linear part of the xdp_buff depends on the NIC and its
> > > + * configuration. When a frag-capable XDP program wants to directly access
> > > + * headers that may be in the non-linear area, call this kfunc to make sure
> > > + * the data is available in the linear area. Alternatively, use dynptr or
> > > + * bpf_xdp_{load,store}_bytes() to access data without pulling.
> > > + *
> > > + * This kfunc can also be used with bpf_xdp_adjust_head() to decapsulate
> > > + * headers in the non-linear data area.
> > > + *
> > > + * A call to this kfunc may reduce headroom. If there is not enough tailroom
> > > + * in the linear data area, metadata and data will be shifted down.
> > > + *
> > > + * A call to this kfunc is susceptible to change the buffer geometry.
> > > + * Therefore, at load time, all checks on pointers previously done by the
> > > + * verifier are invalidated and must be performed again, if the kfunc is used
> > > + * in combination with direct packet access.
> > > + *
> > > + * Return:
> > > + * * %0         - success
> > > + * * %-EINVAL   - invalid len
> > > + */
> > > +__bpf_kfunc int bpf_xdp_pull_data(struct xdp_md *x, u32 len)
> > > +{
> > > +     struct xdp_buff *xdp = (struct xdp_buff *)x;
> > > +     struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> > > +     int i, delta, shift, headroom, tailroom, n_frags_free = 0;
> > > +     void *data_hard_end = xdp_data_hard_end(xdp);
> > > +     int data_len = xdp->data_end - xdp->data;
> > > +     void *start;
> > > +
> > > +     if (len <= data_len)
> > > +             return 0;
> > > +
> > > +     if (unlikely(len > xdp_get_buff_len(xdp)))
> > > +             return -EINVAL;
> > > +
> > > +     start = xdp_data_meta_unsupported(xdp) ? xdp->data : xdp->data_meta;
> > > +
> > > +     headroom = start - xdp->data_hard_start - sizeof(struct xdp_frame);
> > > +     tailroom = data_hard_end - xdp->data_end;
> > > +
> > > +     delta = len - data_len;
> > > +     if (unlikely(delta > tailroom + headroom))
> > > +             return -EINVAL;
> > > +
> > > +     shift = delta - tailroom;
> > > +     if (shift > 0) {
> > > +             memmove(start - shift, start, xdp->data_end - start);
> > > +
> > > +             xdp->data_meta -= shift;
> > > +             xdp->data -= shift;
> > > +             xdp->data_end -= shift;
> > > +     }
> > > +
> > > +     for (i = 0; i < sinfo->nr_frags && delta; i++) {
> > > +             skb_frag_t *frag = &sinfo->frags[i];
> > > +             u32 shrink = min_t(u32, delta, skb_frag_size(frag));
> > > +
> > > +             memcpy(xdp->data_end, skb_frag_address(frag), shrink);
> > > +
> > > +             xdp->data_end += shrink;
> > > +             sinfo->xdp_frags_size -= shrink;
> > > +             delta -= shrink;
> > > +             if (bpf_xdp_shrink_data(xdp, frag, shrink, false))
> > > +                     n_frags_free++;
> > > +     }
> > > +
> > > +     if (unlikely(n_frags_free)) {
> > > +             memmove(sinfo->frags, sinfo->frags + n_frags_free,
> > > +                     (sinfo->nr_frags - n_frags_free) * sizeof(skb_frag_t));
> > > +
> > > +             sinfo->nr_frags -= n_frags_free;
> > > +
> > > +             if (!sinfo->nr_frags)
> > > +                     xdp_buff_clear_frags_flag(xdp);
> >
> > Nit: should we take care of pfmemalloc flag as well?
> >
>
> Does it mean we should also clear this bit in bpf_xdp_adjsut_tail()?

Looking at the code, it makes sense to clear it while it currently
does not make any difference. All readers of this flag check against
it only when there are fragments.

>
> > > +     }
> > > +
> > > +     return 0;
> > > +}
> > > +
> > >  __bpf_kfunc_end_defs();
> > >
> > >  int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
> > > @@ -12239,6 +12329,7 @@ BTF_KFUNCS_END(bpf_kfunc_check_set_skb_meta)
> > >
> > >  BTF_KFUNCS_START(bpf_kfunc_check_set_xdp)
> > >  BTF_ID_FLAGS(func, bpf_dynptr_from_xdp)
> > > +BTF_ID_FLAGS(func, bpf_xdp_pull_data)
> > >  BTF_KFUNCS_END(bpf_kfunc_check_set_xdp)
> > >
> > >  BTF_KFUNCS_START(bpf_kfunc_check_set_sock_addr)
> > > --
> > > 2.47.3
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ