linux-kernel - Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHS8izOnzxbSuW5=aiTAUja7D2ARgtR13qYWr-bXNYSCvm5Bbg@mail.gmail.com>
Date: Wed, 15 Oct 2025 10:44:19 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Pavel Begunkov <asml.silence@...il.com>, netdev@...r.kernel.org, 
	Andrew Lunn <andrew@...n.ch>, davem@...emloft.net, Eric Dumazet <edumazet@...gle.com>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Donald Hunter <donald.hunter@...il.com>, Michael Chan <michael.chan@...adcom.com>, 
	Pavan Chebbi <pavan.chebbi@...adcom.com>, Jesper Dangaard Brouer <hawk@...nel.org>, 
	John Fastabend <john.fastabend@...il.com>, Stanislav Fomichev <sdf@...ichev.me>, 
	Joshua Washington <joshwash@...gle.com>, Harshitha Ramamurthy <hramamurthy@...gle.com>, 
	Jian Shen <shenjian15@...wei.com>, Salil Mehta <salil.mehta@...wei.com>, 
	Jijie Shao <shaojijie@...wei.com>, Sunil Goutham <sgoutham@...vell.com>, 
	Geetha sowjanya <gakula@...vell.com>, Subbaraya Sundeep <sbhatta@...vell.com>, 
	hariprasad <hkelam@...vell.com>, Bharat Bhushan <bbhushan2@...vell.com>, 
	Saeed Mahameed <saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>, 
	Leon Romanovsky <leon@...nel.org>, Alexander Duyck <alexanderduyck@...com>, kernel-team@...a.com, 
	Ilias Apalodimas <ilias.apalodimas@...aro.org>, Joe Damato <joe@...a.to>, David Wei <dw@...idwei.uk>, 
	Willem de Bruijn <willemb@...gle.com>, Breno Leitao <leitao@...ian.org>, 
	Dragos Tatulea <dtatulea@...dia.com>, linux-kernel@...r.kernel.org, 
	linux-doc@...r.kernel.org, linux-rdma@...r.kernel.org, 
	Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH net-next v4 00/24][pull request] Queue configs and large
 buffer providers

On Tue, Oct 14, 2025 at 6:41 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Mon, 13 Oct 2025 21:41:38 -0700 Mina Almasry wrote:
> > > I'd like to rework these a little bit.
> > > On reflection I don't like the single size control.
> > > Please hold off.
> >
> > FWIW when I last looked at this I didn't like that the size control
> > seemed to control the size of the allocations made from the pp, but
> > not the size actually posted to the NIC.
> >
> > I.e. in the scenario where the driver fragments each pp buffer into 2,
> > and the user asks for 8K rx-buf-len, the size actually posted to the
> > NIC would have actually been 4K (8K / 2 for 2 fragments).
> >
> > Not sure how much of a concern this really is. I thought it would be
> > great if somehow rx-buf-len controlled the buffer sizes actually
> > posted to the NIC, because that what ultimately matters, no (it ends
> > up being the size of the incoming frags)? Or does that not matter for
> > some reason I'm missing?
>
> I spent a couple of hours trying to write up my thoughts but I still haven't
> finished 😅️ I'll send the full thing tomorrow.
>
> You may have looked at hns3 is that right? It bumps the page pool order
> by 1 so that it can fit two allocations into each page. I'm guessing
> it's a remnant of "page flipping". The other current user of rx-buf-len
> (otx2) doesn't do that - it uses simple page_order(rx_buf_len), AFAICT.
> If that's what you mean - I'd chalk the hns3 behavior to "historical
> reasons", it can probably be straightened out today to everyone's
> benefit.
>
> I wanted to reply already (before I present my "full case" :)) because
> my thinking started slipping in the opposite direction of being
> concerned about "buffer sizes actually posted to the NIC".
> Say the NIC packs packet payloads into buffers like this:
>
>           1          2     3
> packets:  xxxxxxxxx  yyyy  zzzzzzz
> buffers:  [xxxx] [xxxx] [x|yyy] [y|zzz] [zzzz]
>
> Hope the diagram makes sense, each [....] is 4k, headers went elsewhere.
>
> If the user filled in the page pool with 16k buffers, and driver split
> it up into 4k chunks. HW packed the payloads into those 4k chunks,
> and GRO reformed them back into just 2 skb frags. Do we really care
> about the buffer size on the HW fill ring being 4kB ? Isn't what user
> cares about that they saw 2 frags not 5 ?

I think what you're saying is what I was trying to say, but you said
it more eloquently and genetically correct. I'm not familiar with the
GRO packing you're referring to so I just assumed the 'buffer sizes
actually posted to the NIC' are the 'buffer sizes we end up seeing in
the skb frags'.

I guess what I'm trying to say in a different way, is: there are lots
of buffer sizes in the rx path, AFAICT, at least:

1. The size of the allocated netmems from the pp.
2. The size of the buffers posted to the NIC (which will be different
from #1 if the page_pool_fragment_netmem or some other trick like
hns3).
3. The size of the frags that end up in the skb (which will be
different from #2 for GRO/other things I don't fully understand).

...and I'm not sure what rx-buf-len should actually configure. My
thinking is that it probably should configure #3, since that is what
the user cares about, I agree with that.

IIRC when I last looked at this a few weeks ago, I think as written
this patch series makes rx-buf-len actually configure #1.

-- 
Thanks,
Mina