linux-kernel - Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info frag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e229a820-e2e0-4dc7-a6c2-03ad6f2bdac3@nvidia.com>
Date: Tue, 10 Feb 2026 18:27:48 +0100
From: Dragos Tatulea <dtatulea@...dia.com>
To: Vladimir Oltean <vladimir.oltean@....com>,
 Jakub Kicinski <kuba@...nel.org>
Cc: Larysa Zaremba <larysa.zaremba@...el.com>, bpf@...r.kernel.org,
 Claudiu Manoil <claudiu.manoil@....com>, Wei Fang <wei.fang@....com>,
 Clark Wang <xiaoning.wang@....com>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>, Tony Nguyen <anthony.l.nguyen@...el.com>,
 Przemek Kitszel <przemyslaw.kitszel@...el.com>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 John Fastabend <john.fastabend@...il.com>,
 Stanislav Fomichev <sdf@...ichev.me>, Andrii Nakryiko <andrii@...nel.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman
 <eddyz87@...il.com>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>,
 Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
 Simon Horman <horms@...nel.org>, Shuah Khan <shuah@...nel.org>,
 Alexander Lobakin <aleksander.lobakin@...el.com>,
 Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
 "Bastien Curutchet (eBPF Foundation)" <bastien.curutchet@...tlin.com>,
 Tushar Vyavahare <tushar.vyavahare@...el.com>,
 Jason Xing <kernelxing@...cent.com>, Ricardo B. Marli√®re <rbm@...e.com>, Eelco Chaudron <echaudro@...hat.com>,
 Lorenzo Bianconi <lorenzo@...nel.org>,
 Toke Hoiland-Jorgensen <toke@...hat.com>, imx@...ts.linux.dev,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 intel-wired-lan@...ts.osuosl.org, linux-kselftest@...r.kernel.org,
 Aleksandr Loktionov <aleksandr.loktionov@...el.com>,
 Tariq Toukan <tariqt@...dia.com>, Nimrod Oren <noren@...dia.com>
Subject: Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info
 frag_size



On 08.02.26 13:59, Vladimir Oltean wrote:
> On Thu, Feb 05, 2026 at 05:54:08PM -0800, Jakub Kicinski wrote:
>> FWIW my feeling is that instead of nickel and diming leftover space 
>> in the frags if someone actually cared about growing mbufs we should
>> have the helper allocate a new page from the PP and append it to the
>> shinfo. Much simpler, "infinite space", and works regardless of the
>> driver. I don't mean that to suggest you implement it, purely to point
>> out that I think nobody really uses positive offsets.. So we can as
>> well switch more complicated drivers back to xdp_rxq_info_reg().
> 
> FWIW, I do have a use case at least in the theoretical sense for
> bpf_xdp_adjust_tail() with positive offsets, although it's still under
> development.
> 
> I'm working on a DSA data path library for XDP, and one of the features
> it supports is redirecting from one user port to another, with in-place
> tag modification.
> 
> If the path to the egress port goes through a tail-tagging switch but
> the path from the ingress port didn't, bpf_xdp_adjust_tail() with a
> positive offset will be called to make space for the tail tags.
>
Jumping a bit late in the conversation...

We were recently discussing this limitation when trying to add growable
tail support for multi-buf XDP buffers in the mlx5 driver (Striding RQ
mode) after lifting the page_size stride limitation for XDP [1].

It turns out that it is quite complicated to do in this mode with the
with existing frag_size configuration...

The issue is that the HW can write a packet in multiple smaller strides
(256B for example) and setting rxq->frag_size to it would not work for
the following reasons:

1) The tailroom formula would yield a negative value, as pointed out by
   this series.

2) Even if this formula would not be the issue, frag_size currently
   means that the underlying storage of each fragment has a size of
   frag_size. So the number of fragments would explode in the driver.
   That's a no go.

3) And even if we would change the semantics of frag_size to mean
   something else so that the XDP code would use frag_size as the available
   growth size in the fragment
   (tailroom = skb_frag_off() + frag_size - skb_frag_size())
   we'd still be very much in the "nickel and diming" space with less than
   256B to spare.

4) The only sane way to do it would be to use a large stride but this
   would kill small packet optimization

So +1 for the direction of having a helper allocating an extra page from
the page_pool instead.

> I'm not sure about the "regardless of the driver" part of your comment.
> Is it possible to mix and match allocation models and still keep track
> of how each individual page needs to be freed? AFAICS in xdp_return_frame(),
> the mem_type is assumed to be the same for the entire xdp_frame.
> 
Wouldn't the allocations happen from the page_pool of the rx queue
so the mem_type would be homogenous. I was initially worried about the
cpumap case but seems to not allow tail growth (frag_size is initialized
to 0 in cpu_map_bpf_prog_run_xdp()).

[1] - Disclaimer: XDP multi-buf fragment growth is still supported for the
      non Striding RQ mode.

Thanks,
Dragos