lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <muuya2c2qrnmr3wzxslgkpeufet3rlnitw5dijcaq2gpy4tnwa@5p2xnefrp5rk>
Date: Sat, 20 Sep 2025 09:25:31 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jesper Dangaard Brouer <hawk@...nel.org>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Ilias Apalodimas <ilias.apalodimas@...aro.org>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>, 
	Clark Williams <clrkwllms@...nel.org>, Steven Rostedt <rostedt@...dmis.org>, netdev@...r.kernel.org, 
	Tariq Toukan <tariqt@...dia.com>, linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH net-next] page_pool: add debug for release to cache from
 wrong CPU

On Fri, Sep 19, 2025 at 04:57:46PM -0700, Jakub Kicinski wrote:
> On Thu, 18 Sep 2025 11:48:21 +0300 Dragos Tatulea wrote:
> > Direct page releases to cache must be done on the same CPU as where NAPI
> > is running.
> 
> You talk about NAPI..
> 
> >  /* Only allow direct recycling in special circumstances, into the
> >   * alloc side cache.  E.g. during RX-NAPI processing for XDP_DROP use-case.
> >   *
> > @@ -768,6 +795,18 @@ static bool page_pool_recycle_in_cache(netmem_ref netmem,
> >  		return false;
> >  	}
> >  
> > +#ifdef CONFIG_DEBUG_PAGE_POOL_CACHE_RELEASE
> > +	if (unlikely(!page_pool_napi_local(pool))) {
> > +		u32 pp_cpuid = READ_ONCE(pool->cpuid);
> 
> but then you print pp->cpuid?
>
Point taken. I didn't want to replicate half of page_pool_napi_local()
in the error path. Printing information about the CPU id is also not
really important. The value comes from the stack trace which points to
the code that recycles to the cache from the wrong CPU.

> The patch seems half-baked. If the NAPI local recycling is incorrect
> the pp will leak a reference and live forever. Which hopefully people
> would notice. Are you adding this check just to double confirm that
> any leaks you're chasing are in the driver, and not in the core?
The point is not to chase leaks but races from doing a recycle to cache
from the wrong CPU. This is how XDP issue was caught where
xdp_set_return_frame_no_direct() was not set appropriately for cpumap [1].

My first approach was to __page_pool_put_page() but then I figured that
the warning should live closer to where the actual assignment happens.

[1] https://lore.kernel.org/all/e60404e2-4782-409f-8596-ae21ce7272c4@kernel.org/

Thanks,
Dragos

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ