linux-kernel - Re: [PATCH net-next v4 00/24][pull request] Queue configs and large buffer providers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251014184119.3ba2dd70@kernel.org>
Date: Tue, 14 Oct 2025 18:41:19 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Mina Almasry <almasrymina@...gle.com>
Cc: Pavel Begunkov <asml.silence@...il.com>, netdev@...r.kernel.org, Andrew
 Lunn <andrew@...n.ch>, davem@...emloft.net, Eric Dumazet
 <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
 <horms@...nel.org>, Donald Hunter <donald.hunter@...il.com>, Michael Chan
 <michael.chan@...adcom.com>, Pavan Chebbi <pavan.chebbi@...adcom.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>, John Fastabend
 <john.fastabend@...il.com>, Stanislav Fomichev <sdf@...ichev.me>, Joshua
 Washington <joshwash@...gle.com>, Harshitha Ramamurthy
 <hramamurthy@...gle.com>, Jian Shen <shenjian15@...wei.com>, Salil Mehta
 <salil.mehta@...wei.com>, Jijie Shao <shaojijie@...wei.com>, Sunil Goutham
 <sgoutham@...vell.com>, Geetha sowjanya <gakula@...vell.com>, Subbaraya
 Sundeep <sbhatta@...vell.com>, hariprasad <hkelam@...vell.com>, Bharat
 Bhushan <bbhushan2@...vell.com>, Saeed Mahameed <saeedm@...dia.com>, Tariq
 Toukan <tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>, Leon Romanovsky
 <leon@...nel.org>, Alexander Duyck <alexanderduyck@...com>,
 kernel-team@...a.com, Ilias Apalodimas <ilias.apalodimas@...aro.org>, Joe
 Damato <joe@...a.to>, David Wei <dw@...idwei.uk>, Willem de Bruijn
 <willemb@...gle.com>, Breno Leitao <leitao@...ian.org>, Dragos Tatulea
 <dtatulea@...dia.com>, linux-kernel@...r.kernel.org,
 linux-doc@...r.kernel.org, linux-rdma@...r.kernel.org, Jonathan Corbet
 <corbet@....net>
Subject: Re: [PATCH net-next v4 00/24][pull request] Queue configs and large
 buffer providers

On Mon, 13 Oct 2025 21:41:38 -0700 Mina Almasry wrote:
> > I'd like to rework these a little bit.
> > On reflection I don't like the single size control.
> > Please hold off.
>
> FWIW when I last looked at this I didn't like that the size control
> seemed to control the size of the allocations made from the pp, but
> not the size actually posted to the NIC.
> 
> I.e. in the scenario where the driver fragments each pp buffer into 2,
> and the user asks for 8K rx-buf-len, the size actually posted to the
> NIC would have actually been 4K (8K / 2 for 2 fragments).
> 
> Not sure how much of a concern this really is. I thought it would be
> great if somehow rx-buf-len controlled the buffer sizes actually
> posted to the NIC, because that what ultimately matters, no (it ends
> up being the size of the incoming frags)? Or does that not matter for
> some reason I'm missing?

I spent a couple of hours trying to write up my thoughts but I still haven't
finished 😅️ I'll send the full thing tomorrow.

You may have looked at hns3 is that right? It bumps the page pool order
by 1 so that it can fit two allocations into each page. I'm guessing
it's a remnant of "page flipping". The other current user of rx-buf-len
(otx2) doesn't do that - it uses simple page_order(rx_buf_len), AFAICT.
If that's what you mean - I'd chalk the hns3 behavior to "historical
reasons", it can probably be straightened out today to everyone's
benefit.

I wanted to reply already (before I present my "full case" :)) because
my thinking started slipping in the opposite direction of being
concerned about "buffer sizes actually posted to the NIC". 
Say the NIC packs packet payloads into buffers like this:

          1          2     3      
packets:  xxxxxxxxx  yyyy  zzzzzzz
buffers:  [xxxx] [xxxx] [x|yyy] [y|zzz] [zzzz]

Hope the diagram makes sense, each [....] is 4k, headers went elsewhere.

If the user filled in the page pool with 16k buffers, and driver split
it up into 4k chunks. HW packed the payloads into those 4k chunks,
and GRO reformed them back into just 2 skb frags. Do we really care
about the buffer size on the HW fill ring being 4kB ? Isn't what user
cares about that they saw 2 frags not 5 ?