[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYSP344FT-JlylwY@soc-5CG4396X81.clients.intel.com>
Date: Thu, 5 Feb 2026 13:41:03 +0100
From: Larysa Zaremba <larysa.zaremba@...el.com>
To: Vladimir Oltean <vladimir.oltean@....com>, Jakub Kicinski
<kuba@...nel.org>
CC: <bpf@...r.kernel.org>, Claudiu Manoil <claudiu.manoil@....com>, Wei Fang
<wei.fang@....com>, Clark Wang <xiaoning.wang@....com>, Andrew Lunn
<andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, "Eric
Dumazet" <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Tony Nguyen
<anthony.l.nguyen@...el.com>, Przemek Kitszel <przemyslaw.kitszel@...el.com>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>, John Fastabend
<john.fastabend@...il.com>, Stanislav Fomichev <sdf@...ichev.me>, "Andrii
Nakryiko" <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
"Eduard Zingerman" <eddyz87@...il.com>, Song Liu <song@...nel.org>, Yonghong
Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>, Hao Luo
<haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, Simon Horman
<horms@...nel.org>, Shuah Khan <shuah@...nel.org>, Alexander Lobakin
<aleksander.lobakin@...el.com>, Maciej Fijalkowski
<maciej.fijalkowski@...el.com>, "Bastien Curutchet (eBPF Foundation)"
<bastien.curutchet@...tlin.com>, Tushar Vyavahare
<tushar.vyavahare@...el.com>, Jason Xing <kernelxing@...cent.com>, Ricardo
B. Marlière <rbm@...e.com>, Eelco Chaudron
<echaudro@...hat.com>, Lorenzo Bianconi <lorenzo@...nel.org>, "Toke
Hoiland-Jorgensen" <toke@...hat.com>, <imx@...ts.linux.dev>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<intel-wired-lan@...ts.osuosl.org>, <linux-kselftest@...r.kernel.org>,
Aleksandr Loktionov <aleksandr.loktionov@...el.com>
Subject: Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info
frag_size
On Thu, Feb 05, 2026 at 02:29:53PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 04, 2026 at 05:34:01PM -0800, Jakub Kicinski wrote:
> > On Thu, 5 Feb 2026 02:59:01 +0200 Vladimir Oltean wrote:
> > > Thanks! This is an extremely subtle corner case. I appreciate the patch
> > > and explanation.
> > >
> > > I did run tests on the blamed commit (which I still have), but to catch
> > > a real issue in a meaningful way it would have been required to have a
> > > program which calls bpf_xdp_adjust_tail() with a very large offset.
> > > I'm noting that I'm seeing the WARN_ON() much easier after your fix, but
> > > before, it was mostly inconsequential for practical cases.
> > >
> > > Namely, the ENETC truesize is 2048, and XDP_PACKET_HEADROOM is 256.
> > > First buffers also contain the skb_shared_info (320 bytes), while
> > > subsequent buffers don't.
> >
> > I can't wrap my head around this series, hope you can tell me where I'm
> > going wrong. AFAICT enetc splits the page into two halves for small MTU.
> >
> > So we have
> >
> > | 2k | 2k |
> > ----------------------------- -----------------------------
> > | hroom | data | troom/shinfo | hroom | data | troom/shinfo |
> > ----------------------------- -----------------------------
> >
> > If we attach the second chunk as frag well have:
> > offset = 2k + hroom
> > size = data.len
> > But we use
> > truesize / frag_size = 2k
> > so
> > tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag);
> > tailroom = 2k - data.len - 2k
> > tailroom = -data.len
> > WARN(tailroom < 0) -> yes
> >
> > The frag_size thing is unusable for any driver that doesn't hand out
> > full pages to frags?
>
> This is an excellent question.
>
> Yes, you're right, bpf_xdp_frags_increase_tail() only has a 50% chance
> of working - the paged data has to be in the first half of the page,
> otherwise the tailroom calculations are not correct due to rxq->frag_size,
> and the WARN_ON() will trigger.
>
> The reason why I didn't notice this during my testing is stupid. I was
> attaching the BPF program to the interface and then detaching it after
> each test, and each test was sending less than the RX ring size (2048)
> worth of packets. So all multi-buffer frames were using buffers which
> were fresh out of enetc_setup_rxbdr() -> ... -> enetc_new_page() (first
> halves) and never out of flipped pages (enetc_bulk_flip_buff()).
>
> This seems to be a good reason to convert this driver to use page pool,
> which I can look into. I'm not sure that there's anything that can be
> done to make the rxq->frag_size mechanism compatible with the current
> buffer allocation scheme.
I was just about to send an answer.
Seems like my mistake here. I actually think adjusting the tail should work, if
we set rxq->frag_size to PAGE_SIZE in enetc and i40e_rx_pg_size() in i40e, and
not to (PAGE_SIZE / 2), as I did at first, but in such case naming this
frag_size is just utterly wrong. Glad Jakub has pointed this out.
ice and idpf are fine, since they use libeth for Rx buffers, so mbuf packets
always reside in 3K+ buffers. But for xsk_buff_pool seems like all drivers
should have PAGE_SIZE as frag_size? I will let the discussion go on for at least
a day and then will send v2 with patches reordered and those sizes corrected,
maybe add ZC fixes on top.
Powered by blists - more mailing lists