[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYWoBRmt-lcM_JkG@soc-5CG4396X81.clients.intel.com>
Date: Fri, 6 Feb 2026 09:36:21 +0100
From: Larysa Zaremba <larysa.zaremba@...el.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: Vladimir Oltean <vladimir.oltean@....com>, <bpf@...r.kernel.org>, "Claudiu
Manoil" <claudiu.manoil@....com>, Wei Fang <wei.fang@....com>, Clark Wang
<xiaoning.wang@....com>, Andrew Lunn <andrew+netdev@...n.ch>, "David S.
Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, "Paolo
Abeni" <pabeni@...hat.com>, Tony Nguyen <anthony.l.nguyen@...el.com>,
"Przemek Kitszel" <przemyslaw.kitszel@...el.com>, Alexei Starovoitov
<ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, Jesper Dangaard
Brouer <hawk@...nel.org>, John Fastabend <john.fastabend@...il.com>,
"Stanislav Fomichev" <sdf@...ichev.me>, Andrii Nakryiko <andrii@...nel.org>,
"Martin KaFai Lau" <martin.lau@...ux.dev>, Eduard Zingerman
<eddyz87@...il.com>, Song Liu <song@...nel.org>, Yonghong Song
<yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>, Hao Luo
<haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, Simon Horman
<horms@...nel.org>, Shuah Khan <shuah@...nel.org>, Alexander Lobakin
<aleksander.lobakin@...el.com>, "Maciej Fijalkowski"
<maciej.fijalkowski@...el.com>, "Bastien Curutchet (eBPF Foundation)"
<bastien.curutchet@...tlin.com>, Tushar Vyavahare
<tushar.vyavahare@...el.com>, Jason Xing <kernelxing@...cent.com>, Ricardo
B. Marlière <rbm@...e.com>, Eelco Chaudron
<echaudro@...hat.com>, Lorenzo Bianconi <lorenzo@...nel.org>, "Toke
Hoiland-Jorgensen" <toke@...hat.com>, <imx@...ts.linux.dev>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<intel-wired-lan@...ts.osuosl.org>, <linux-kselftest@...r.kernel.org>,
Aleksandr Loktionov <aleksandr.loktionov@...el.com>
Subject: Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info
frag_size
On Thu, Feb 05, 2026 at 05:54:08PM -0800, Jakub Kicinski wrote:
> On Thu, 5 Feb 2026 15:40:46 +0200 Vladimir Oltean wrote:
> > > > I mean, it should "work" given the caveat that calling bpf_xdp_adjust_tail()
> > > > on a first-half page buffer with a large offset risks leaking into the
> > > > second half, which may also be in use, and this will go undetected, right?
> > > > Although the practical chances of that happening are low, the requested
> > > > offset needs to be in the order of hundreds still.
> > >
> > > Oh, I did get carried away there...
> > > Well, one thing is shared page memory model in enetc and i40e, another thing is
> > > xsk_buff_pool, where chunk size can be between 2K and PAGE_SIZE. What about
> > >
> > > tailroom = rxq->frag_size - skb_frag_size(frag) -
> > > (skb_frag_off(frag) % rxq->frag_size);
> > >
> > > When frag_size is set to 2K, headroom is let's say 256, so aligned DMA write
> > > size is 1420.
> > > last frag at the start of the page: offset=256, size<=1420
> > > tailroom >= 2K - 1420 - 256 = 372
> > > last frag in the middle of the page: offset=256+2K, size<=1420
> > > tailroom >= 2K - 1420 - ((256 + 2K) % 2K) = 372
> > >
> > > And for drivers that do not fragment pages for multi-buffer packets, nothing
> > > changes, since offset is always less than rxq->frag_size.
> > >
> > > This brings us back to rxq->frag_size being half of a page for enetc and i40e,
> > > and seems like in ZC mode it should be pool->chunk_size to work properly.
> >
> > With skb_frag_off() taken into account modulo 2K for the tailroom
> > calculation, I can confirm bpf_xdp_frags_increase_tail() works well for
> > ENETC. I haven't fully considered the side effects, though.
>
> +1, also seems to me like it would work tho I haven't thought thru all
> the cases. We do need to document and name things well, tho, 'cause
> subtleties are piling up ;) Maybe it's time for an ASCII art
> for xdp layout?
>
Yeah, for AF_XDP mbuf in i40e we actually recently discovered another
buffer-size-calculation-related issue, so some visual aid would be useful. I
will think about how it should look.
> FWIW my feeling is that instead of nickel and diming leftover space
> in the frags if someone actually cared about growing mbufs we should
> have the helper allocate a new page from the PP and append it to the
> shinfo. Much simpler, "infinite space", and works regardless of the
> driver. I don't mean that to suggest you implement it, purely to point
> out that I think nobody really uses positive offsets.. So we can as
> well switch more complicated drivers back to xdp_rxq_info_reg().
>
As Vladimir has mentioned, if the driver does not use header split, frags will
have a tailroom of a size of skb_shared_info, so tail growing does work in
practice.
Allocating a page_pool buffer (given XDP queue has one attached) is certainly an
option, although I am not sure if anyone needs it. Furthermore, growing tail
would still fail for a single-buf case.
Powered by blists - more mailing lists