[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e229a820-e2e0-4dc7-a6c2-03ad6f2bdac3@nvidia.com>
Date: Tue, 10 Feb 2026 18:27:48 +0100
From: Dragos Tatulea <dtatulea@...dia.com>
To: Vladimir Oltean <vladimir.oltean@....com>,
Jakub Kicinski <kuba@...nel.org>
Cc: Larysa Zaremba <larysa.zaremba@...el.com>, bpf@...r.kernel.org,
Claudiu Manoil <claudiu.manoil@....com>, Wei Fang <wei.fang@....com>,
Clark Wang <xiaoning.wang@....com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Stanislav Fomichev <sdf@...ichev.me>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman
<eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Simon Horman <horms@...nel.org>, Shuah Khan <shuah@...nel.org>,
Alexander Lobakin <aleksander.lobakin@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
"Bastien Curutchet (eBPF Foundation)" <bastien.curutchet@...tlin.com>,
Tushar Vyavahare <tushar.vyavahare@...el.com>,
Jason Xing <kernelxing@...cent.com>, Ricardo B. Marlière <rbm@...e.com>, Eelco Chaudron <echaudro@...hat.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Toke Hoiland-Jorgensen <toke@...hat.com>, imx@...ts.linux.dev,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
intel-wired-lan@...ts.osuosl.org, linux-kselftest@...r.kernel.org,
Aleksandr Loktionov <aleksandr.loktionov@...el.com>,
Tariq Toukan <tariqt@...dia.com>, Nimrod Oren <noren@...dia.com>
Subject: Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info
frag_size
On 08.02.26 13:59, Vladimir Oltean wrote:
> On Thu, Feb 05, 2026 at 05:54:08PM -0800, Jakub Kicinski wrote:
>> FWIW my feeling is that instead of nickel and diming leftover space
>> in the frags if someone actually cared about growing mbufs we should
>> have the helper allocate a new page from the PP and append it to the
>> shinfo. Much simpler, "infinite space", and works regardless of the
>> driver. I don't mean that to suggest you implement it, purely to point
>> out that I think nobody really uses positive offsets.. So we can as
>> well switch more complicated drivers back to xdp_rxq_info_reg().
>
> FWIW, I do have a use case at least in the theoretical sense for
> bpf_xdp_adjust_tail() with positive offsets, although it's still under
> development.
>
> I'm working on a DSA data path library for XDP, and one of the features
> it supports is redirecting from one user port to another, with in-place
> tag modification.
>
> If the path to the egress port goes through a tail-tagging switch but
> the path from the ingress port didn't, bpf_xdp_adjust_tail() with a
> positive offset will be called to make space for the tail tags.
>
Jumping a bit late in the conversation...
We were recently discussing this limitation when trying to add growable
tail support for multi-buf XDP buffers in the mlx5 driver (Striding RQ
mode) after lifting the page_size stride limitation for XDP [1].
It turns out that it is quite complicated to do in this mode with the
with existing frag_size configuration...
The issue is that the HW can write a packet in multiple smaller strides
(256B for example) and setting rxq->frag_size to it would not work for
the following reasons:
1) The tailroom formula would yield a negative value, as pointed out by
this series.
2) Even if this formula would not be the issue, frag_size currently
means that the underlying storage of each fragment has a size of
frag_size. So the number of fragments would explode in the driver.
That's a no go.
3) And even if we would change the semantics of frag_size to mean
something else so that the XDP code would use frag_size as the available
growth size in the fragment
(tailroom = skb_frag_off() + frag_size - skb_frag_size())
we'd still be very much in the "nickel and diming" space with less than
256B to spare.
4) The only sane way to do it would be to use a large stride but this
would kill small packet optimization
So +1 for the direction of having a helper allocating an extra page from
the page_pool instead.
> I'm not sure about the "regardless of the driver" part of your comment.
> Is it possible to mix and match allocation models and still keep track
> of how each individual page needs to be freed? AFAICS in xdp_return_frame(),
> the mem_type is assumed to be the same for the entire xdp_frame.
>
Wouldn't the allocations happen from the page_pool of the rx queue
so the mem_type would be homogenous. I was initially worried about the
cpumap case but seems to not allow tail growth (frag_size is initialized
to 0 in cpu_map_bpf_prog_run_xdp()).
[1] - Disclaimer: XDP multi-buf fragment growth is still supported for the
non Striding RQ mode.
Thanks,
Dragos
Powered by blists - more mailing lists