[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260205124638.hxzvjiocephzlrk3@skbuf>
Date: Thu, 5 Feb 2026 14:46:38 +0200
From: Vladimir Oltean <vladimir.oltean@....com>
To: Larysa Zaremba <larysa.zaremba@...el.com>
Cc: Jakub Kicinski <kuba@...nel.org>, bpf@...r.kernel.org,
Claudiu Manoil <claudiu.manoil@....com>,
Wei Fang <wei.fang@....com>, Clark Wang <xiaoning.wang@....com>,
Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Stanislav Fomichev <sdf@...ichev.me>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>,
KP Singh <kpsingh@...nel.org>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, Simon Horman <horms@...nel.org>,
Shuah Khan <shuah@...nel.org>,
Alexander Lobakin <aleksander.lobakin@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
"Bastien Curutchet (eBPF Foundation)" <bastien.curutchet@...tlin.com>,
Tushar Vyavahare <tushar.vyavahare@...el.com>,
Jason Xing <kernelxing@...cent.com>,
Ricardo B. Marlière <rbm@...e.com>,
Eelco Chaudron <echaudro@...hat.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Toke Hoiland-Jorgensen <toke@...hat.com>, imx@...ts.linux.dev,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
intel-wired-lan@...ts.osuosl.org, linux-kselftest@...r.kernel.org,
Aleksandr Loktionov <aleksandr.loktionov@...el.com>
Subject: Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info
frag_size
On Thu, Feb 05, 2026 at 01:41:03PM +0100, Larysa Zaremba wrote:
> On Thu, Feb 05, 2026 at 02:29:53PM +0200, Vladimir Oltean wrote:
> > On Wed, Feb 04, 2026 at 05:34:01PM -0800, Jakub Kicinski wrote:
> > > On Thu, 5 Feb 2026 02:59:01 +0200 Vladimir Oltean wrote:
> > > > Thanks! This is an extremely subtle corner case. I appreciate the patch
> > > > and explanation.
> > > >
> > > > I did run tests on the blamed commit (which I still have), but to catch
> > > > a real issue in a meaningful way it would have been required to have a
> > > > program which calls bpf_xdp_adjust_tail() with a very large offset.
> > > > I'm noting that I'm seeing the WARN_ON() much easier after your fix, but
> > > > before, it was mostly inconsequential for practical cases.
> > > >
> > > > Namely, the ENETC truesize is 2048, and XDP_PACKET_HEADROOM is 256.
> > > > First buffers also contain the skb_shared_info (320 bytes), while
> > > > subsequent buffers don't.
> > >
> > > I can't wrap my head around this series, hope you can tell me where I'm
> > > going wrong. AFAICT enetc splits the page into two halves for small MTU.
> > >
> > > So we have
> > >
> > > | 2k | 2k |
> > > ----------------------------- -----------------------------
> > > | hroom | data | troom/shinfo | hroom | data | troom/shinfo |
> > > ----------------------------- -----------------------------
> > >
> > > If we attach the second chunk as frag well have:
> > > offset = 2k + hroom
> > > size = data.len
> > > But we use
> > > truesize / frag_size = 2k
> > > so
> > > tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag);
> > > tailroom = 2k - data.len - 2k
> > > tailroom = -data.len
> > > WARN(tailroom < 0) -> yes
> > >
> > > The frag_size thing is unusable for any driver that doesn't hand out
> > > full pages to frags?
> >
> > This is an excellent question.
> >
> > Yes, you're right, bpf_xdp_frags_increase_tail() only has a 50% chance
> > of working - the paged data has to be in the first half of the page,
> > otherwise the tailroom calculations are not correct due to rxq->frag_size,
> > and the WARN_ON() will trigger.
> >
> > The reason why I didn't notice this during my testing is stupid. I was
> > attaching the BPF program to the interface and then detaching it after
> > each test, and each test was sending less than the RX ring size (2048)
> > worth of packets. So all multi-buffer frames were using buffers which
> > were fresh out of enetc_setup_rxbdr() -> ... -> enetc_new_page() (first
> > halves) and never out of flipped pages (enetc_bulk_flip_buff()).
> >
> > This seems to be a good reason to convert this driver to use page pool,
> > which I can look into. I'm not sure that there's anything that can be
> > done to make the rxq->frag_size mechanism compatible with the current
> > buffer allocation scheme.
>
> I was just about to send an answer.
>
> Seems like my mistake here. I actually think adjusting the tail should work, if
> we set rxq->frag_size to PAGE_SIZE in enetc and i40e_rx_pg_size() in i40e, and
> not to (PAGE_SIZE / 2), as I did at first, but in such case naming this
> frag_size is just utterly wrong. Glad Jakub has pointed this out.
I mean, it should "work" given the caveat that calling bpf_xdp_adjust_tail()
on a first-half page buffer with a large offset risks leaking into the
second half, which may also be in use, and this will go undetected, right?
Although the practical chances of that happening are low, the requested
offset needs to be in the order of hundreds still.
> ice and idpf are fine, since they use libeth for Rx buffers, so mbuf packets
> always reside in 3K+ buffers. But for xsk_buff_pool seems like all drivers
> should have PAGE_SIZE as frag_size? I will let the discussion go on for at least
> a day and then will send v2 with patches reordered and those sizes corrected,
> maybe add ZC fixes on top.
Powered by blists - more mailing lists