netdev - Re: [PATCH net] net: page_pool: allow enabling recycling late, fix false positive warning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aI0prRzAJkEXdkEa@mini-arch>
Date: Fri, 1 Aug 2025 13:55:09 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
	pabeni@...hat.com, andrew+netdev@...n.ch, horms@...nel.org,
	David Wei <dw@...idwei.uk>, michael.chan@...adcom.com,
	pavan.chebbi@...adcom.com, hawk@...nel.org,
	ilias.apalodimas@...aro.org, almasrymina@...gle.com,
	sdf@...ichev.me
Subject: Re: [PATCH net] net: page_pool: allow enabling recycling late, fix
 false positive warning

On 08/01, Jakub Kicinski wrote:
> Page pool can have pages "directly" (locklessly) recycled to it,
> if the NAPI that owns the page pool is scheduled to run on the same CPU.
> To make this safe we check that the NAPI is disabled while we destroy
> the page pool. In most cases NAPI and page pool lifetimes are tied
> together so this happens naturally.
> 
> The queue API expects the following order of calls:
>  -> mem_alloc
>     alloc new pp
>  -> stop
>     napi_disable
>  -> start
>     napi_enable
>  -> mem_free
>     free old pp
> 
> Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
> But the NAPIs are only stopped between ->stop and ->start. We created
> page_pool_disable_direct_recycling() to safely shut down the recycling
> in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
> have to worry about recycling any more.
> 
> Unfortunately, the page_pool_disable_direct_recycling() is not enough
> to deal with failures which necessitate freeing the _new_ page pool.
> If we hit a failure in ->mem_alloc or ->stop the new page pool has
> to be freed while the NAPI is active (assuming driver attaches the
> page pool to an existing NAPI instance and doesn't reallocate NAPIs).
> 
> Freeing the new page pool is technically safe because it hasn't been
> used for any packets, yet, so there can be no recycling. But the check
> in napi_assert_will_not_race() has no way of knowing that. We could
> check if page pool is empty but that'd make the check much less likely
> to trigger during development.
> 
> Add page_pool_enable_direct_recycling(), pairing with
> page_pool_disable_direct_recycling(). It will allow us to create the new
> page pools in "disabled" state and only enable recycling when we know
> the reconfig operation will not fail.
> 
> Coincidentally it will also let us re-enable the recycling for the old
> pool, if the reconfig failed:
> 
>  -> mem_alloc (new)
>  -> stop (old)
>     # disables direct recycling for old
>  -> start (new)
>     # fail!!
>  -> start (old)
>     # go back to old pp but direct recycling is lost :(
>  -> mem_free (new)
> 
> Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
> Tested-by: David Wei <dw@...idwei.uk>
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> Thanks to David Wei for confirming the problem on bnxt and testing
> the fix. I hit this writing the fbnic support for ZC, TBH.
> Any driver where NAPI instance gets reused and not reallocated on each
> queue restart may have this problem. netdevsim doesn't 'cause the
> callbacks can't fail in funny ways there.
> 
> CC: michael.chan@...adcom.com
> CC: pavan.chebbi@...adcom.com
> CC: hawk@...nel.org
> CC: ilias.apalodimas@...aro.org
> CC: dw@...idwei.uk
> CC: almasrymina@...gle.com
> CC: sdf@...ichev.me
> ---
>  include/net/page_pool/types.h             |  2 ++
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |  9 ++++++++-
>  net/core/page_pool.c                      | 13 +++++++++++++
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 431b593de709..1509a536cb85 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -265,6 +265,8 @@ struct page_pool *page_pool_create_percpu(const struct page_pool_params *params,
>  struct xdp_mem_info;
>  
>  #ifdef CONFIG_PAGE_POOL
> +void page_pool_enable_direct_recycling(struct page_pool *pool,
> +				       struct napi_struct *napi);
>  void page_pool_disable_direct_recycling(struct page_pool *pool);
>  void page_pool_destroy(struct page_pool *pool);
>  void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 5578ddcb465d..76a4c5ae8000 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -3819,7 +3819,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
>  	if (BNXT_RX_PAGE_MODE(bp))
>  		pp.pool_size += bp->rx_ring_size / rx_size_fac;
>  	pp.nid = numa_node;
> -	pp.napi = &rxr->bnapi->napi;
>  	pp.netdev = bp->dev;
>  	pp.dev = &bp->pdev->dev;
>  	pp.dma_dir = bp->rx_dir;
> @@ -3851,6 +3850,12 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
>  	return PTR_ERR(pool);
>  }
>  
> +static void bnxt_enable_rx_page_pool(struct bnxt_rx_ring_info *rxr)
> +{
> +	page_pool_enable_direct_recycling(rxr->head_pool, &rxr->bnapi->napi);
> +	page_pool_enable_direct_recycling(rxr->page_pool, &rxr->bnapi->napi);

We do bnxt_separate_head_pool check for the disable_direct_recycling
of head_pool. Is it safe to skip the check here because we always allocate two
pps from queue_mgmt callbacks? (not clear for me from a quick glance at
bnxt_alloc_rx_page_pool)