[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250805003654.2944974-1-kuba@kernel.org>
Date: Mon, 4 Aug 2025 17:36:54 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: davem@...emloft.net
Cc: netdev@...r.kernel.org,
edumazet@...gle.com,
pabeni@...hat.com,
andrew+netdev@...n.ch,
horms@...nel.org,
Jakub Kicinski <kuba@...nel.org>,
David Wei <dw@...idwei.uk>,
michael.chan@...adcom.com,
pavan.chebbi@...adcom.com,
hawk@...nel.org,
ilias.apalodimas@...aro.org,
almasrymina@...gle.com,
sdf@...ichev.me
Subject: [PATCH net v2] net: page_pool: allow enabling recycling late, fix false positive warning
Page pool can have pages "directly" (locklessly) recycled to it,
if the NAPI that owns the page pool is scheduled to run on the same CPU.
To make this safe we check that the NAPI is disabled while we destroy
the page pool. In most cases NAPI and page pool lifetimes are tied
together so this happens naturally.
The queue API expects the following order of calls:
-> mem_alloc
alloc new pp
-> stop
napi_disable
-> start
napi_enable
-> mem_free
free old pp
Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
But the NAPIs are only stopped between ->stop and ->start. We created
page_pool_disable_direct_recycling() to safely shut down the recycling
in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
have to worry about recycling any more.
Unfortunately, the page_pool_disable_direct_recycling() is not enough
to deal with failures which necessitate freeing the _new_ page pool.
If we hit a failure in ->mem_alloc or ->stop the new page pool has
to be freed while the NAPI is active (assuming driver attaches the
page pool to an existing NAPI instance and doesn't reallocate NAPIs).
Freeing the new page pool is technically safe because it hasn't been
used for any packets, yet, so there can be no recycling. But the check
in napi_assert_will_not_race() has no way of knowing that. We could
check if page pool is empty but that'd make the check much less likely
to trigger during development.
Add page_pool_enable_direct_recycling(), pairing with
page_pool_disable_direct_recycling(). It will allow us to create the new
page pools in "disabled" state and only enable recycling when we know
the reconfig operation will not fail.
Coincidentally it will also let us re-enable the recycling for the old
pool, if the reconfig failed:
-> mem_alloc (new)
-> stop (old)
# disables direct recycling for old
-> start (new)
# fail!!
-> start (old)
# go back to old pp but direct recycling is lost :(
-> mem_free (new)
The new helper is idempotent to make the life easier for drivers,
which can operate in HDS mode and support zero-copy Rx.
The driver can call the helper twice whether there are two pools
or it has multiple references to a single pool.
Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
Tested-by: David Wei <dw@...idwei.uk>
Signed-off-by: Jakub Kicinski <kuba@...nel.org>
---
v2:
- add kdoc
- WARN_ON_ONCE -> WARN_ON
v1: https://lore.kernel.org/20250801173011.2454447-1-kuba@kernel.org
Thanks to David Wei for confirming the problem on bnxt and testing
the fix. I hit this writing the fbnic support for ZC, TBH.
Any driver where NAPI instance gets reused and not reallocated on each
queue restart may have this problem. netdevsim doesn't 'cause the
callbacks can't fail in funny ways there.
CC: michael.chan@...adcom.com
CC: pavan.chebbi@...adcom.com
CC: hawk@...nel.org
CC: ilias.apalodimas@...aro.org
CC: dw@...idwei.uk
CC: almasrymina@...gle.com
CC: sdf@...ichev.me
---
include/net/page_pool/types.h | 2 ++
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 ++++++-
net/core/page_pool.c | 29 +++++++++++++++++++++++
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 431b593de709..1509a536cb85 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -265,6 +265,8 @@ struct page_pool *page_pool_create_percpu(const struct page_pool_params *params,
struct xdp_mem_info;
#ifdef CONFIG_PAGE_POOL
+void page_pool_enable_direct_recycling(struct page_pool *pool,
+ struct napi_struct *napi);
void page_pool_disable_direct_recycling(struct page_pool *pool);
void page_pool_destroy(struct page_pool *pool);
void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5578ddcb465d..76a4c5ae8000 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3819,7 +3819,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
if (BNXT_RX_PAGE_MODE(bp))
pp.pool_size += bp->rx_ring_size / rx_size_fac;
pp.nid = numa_node;
- pp.napi = &rxr->bnapi->napi;
pp.netdev = bp->dev;
pp.dev = &bp->pdev->dev;
pp.dma_dir = bp->rx_dir;
@@ -3851,6 +3850,12 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
return PTR_ERR(pool);
}
+static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
+{
+ page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
+ page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);
+}
+
static int bnxt_alloc_rx_agg_bmap(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
{
u16 mem_size;
@@ -3889,6 +3894,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
rc = bnxt_alloc_rx_page_pool(bp, rxr, cpu_node);
if (rc)
return rc;
+ bnxt_enable_rx_page_pool(rxr);
rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
if (rc < 0)
@@ -16031,6 +16037,7 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
goto err_reset;
}
+ bnxt_enable_rx_page_pool(rxr);
napi_enable_locked(&bnapi->napi);
bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 05e2e22a8f7c..343a6cac21e3 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1201,6 +1201,35 @@ void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
pool->xdp_mem_id = mem->id;
}
+/**
+ * page_pool_enable_direct_recycling() - mark page pool as owned by NAPI
+ * @pool: page pool to modify
+ * @napi: NAPI instance to associate the page pool with
+ *
+ * Associate a page pool with a NAPI instance for lockless page recycling.
+ * This is useful when a new page pool has to be added to a NAPI instance
+ * without disabling that NAPI instance, to mark the point at which control
+ * path "hands over" the page pool to the NAPI instance. In most cases driver
+ * can simply set the @napi field in struct page_pool_params, and does not
+ * have to call this helper.
+ *
+ * The function is idempotent, but does not implement any refcounting.
+ * Single page_pool_disable_direct_recycling() will disable recycling,
+ * no matter how many times enable was called.
+ */
+void page_pool_enable_direct_recycling(struct page_pool *pool,
+ struct napi_struct *napi)
+{
+ if (READ_ONCE(pool->p.napi) == napi)
+ return;
+ WARN_ON(!napi || pool->p.napi);
+
+ mutex_lock(&page_pools_lock);
+ WRITE_ONCE(pool->p.napi, napi);
+ mutex_unlock(&page_pools_lock);
+}
+EXPORT_SYMBOL(page_pool_enable_direct_recycling);
+
void page_pool_disable_direct_recycling(struct page_pool *pool)
{
/* Disable direct recycling based on pool->cpuid.
--
2.50.1
Powered by blists - more mailing lists