netdev - [PATCH AUTOSEL 6.17-5.4] page_pool: Clamp pool size to max 16K pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20251025160905.3857885-382-sashal@kernel.org>
Date: Sat, 25 Oct 2025 12:00:13 -0400
From: Sasha Levin <sashal@...nel.org>
To: patches@...ts.linux.dev,
	stable@...r.kernel.org
Cc: Dragos Tatulea <dtatulea@...dia.com>,
	Tariq Toukan <tariqt@...dia.com>,
	Paolo Abeni <pabeni@...hat.com>,
	Sasha Levin <sashal@...nel.org>,
	hawk@...nel.org,
	ilias.apalodimas@...aro.org,
	netdev@...r.kernel.org
Subject: [PATCH AUTOSEL 6.17-5.4] page_pool: Clamp pool size to max 16K pages

From: Dragos Tatulea <dtatulea@...dia.com>

[ Upstream commit a1b501a8c6a87c9265fd03bd004035199e2e8128 ]

page_pool_init() returns E2BIG when the page_pool size goes above 32K
pages. As some drivers are configuring the page_pool size according to
the MTU and ring size, there are cases where this limit is exceeded and
the queue creation fails.

The page_pool size doesn't have to cover a full queue, especially for
larger ring size. So clamp the size instead of returning an error. Do
this in the core to avoid having each driver do the clamping.

The current limit was deemed to high [1] so it was reduced to 16K to avoid
page waste.

[1] https://lore.kernel.org/all/1758532715-820422-3-git-send-email-tariqt@nvidia.com/

Signed-off-by: Dragos Tatulea <dtatulea@...dia.com>
Reviewed-by: Tariq Toukan <tariqt@...dia.com>
Link: https://patch.msgid.link/20250926131605.2276734-2-dtatulea@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@...hat.com>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---

LLM Generated explanations, may be completely bogus:

YES
- The change at `net/core/page_pool.c:213` and
  `net/core/page_pool.c:214` stops rejecting large pools and instead
  clamps the ptr_ring allocation to 16k entries, which keeps queue setup
  from failing with `-E2BIG` while still bounding the cached page
  budget. Without this patch, any driver that computes a `pool_size`
  above 32768 immediately aborts queue creation; for example,
  `mlx5e_alloc_rq()` sets `pp_params.pool_size = pool_size` in
  `drivers/net/ethernet/mellanox/mlx5/core/en_main.c:906` and again at
  `drivers/net/ethernet/mellanox/mlx5/core/en_main.c:1011`, and on error
  it propagates the failure (`goto err_free_by_rq_type`) so the RX queue
  never comes up. `stmmac_init_rx_buffers()` follows the same pattern in
  `drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2051`–`drivers/net/
  ethernet/stmicro/stmmac/stmmac_main.c:2066`, meaning larger rings or
  MTU-derived pools currently make the interface unusable.
- The lower cap is safe: when the ptr_ring fills, the existing slow-path
  already frees excess pages (`page_pool_recycle_in_ring()` at
  `net/core/page_pool.c:746` together with the fallback in
  `page_pool_put_unrefed_netmem()` at `net/core/page_pool.c:873`), so a
  smaller cache only increases occasional allocations but does not
  change correctness. No ABI or driver interfaces are touched, and every
  driver benefits automatically without per-driver clamps.
- This is a minimal, localized fix that prevents hard user-visible
  failures (device queues refusing to start) on systems with large RX
  rings or jumbo MTUs, making it an excellent candidate for stable
  backports.

 net/core/page_pool.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index e224d2145eed9..1a5edec485f14 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -211,11 +211,7 @@ static int page_pool_init(struct page_pool *pool,
 		return -EINVAL;
 
 	if (pool->p.pool_size)
-		ring_qsize = pool->p.pool_size;
-
-	/* Sanity limit mem that can be pinned down */
-	if (ring_qsize > 32768)
-		return -E2BIG;
+		ring_qsize = min(pool->p.pool_size, 16384);
 
 	/* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
 	 * DMA_BIDIRECTIONAL is for allowing page used for DMA sending,
-- 
2.51.0