[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <muuya2c2qrnmr3wzxslgkpeufet3rlnitw5dijcaq2gpy4tnwa@5p2xnefrp5rk>
Date: Sat, 20 Sep 2025 09:25:31 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jesper Dangaard Brouer <hawk@...nel.org>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>, Steven Rostedt <rostedt@...dmis.org>, netdev@...r.kernel.org,
Tariq Toukan <tariqt@...dia.com>, linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev
Subject: Re: [PATCH net-next] page_pool: add debug for release to cache from
wrong CPU
On Fri, Sep 19, 2025 at 04:57:46PM -0700, Jakub Kicinski wrote:
> On Thu, 18 Sep 2025 11:48:21 +0300 Dragos Tatulea wrote:
> > Direct page releases to cache must be done on the same CPU as where NAPI
> > is running.
>
> You talk about NAPI..
>
> > /* Only allow direct recycling in special circumstances, into the
> > * alloc side cache. E.g. during RX-NAPI processing for XDP_DROP use-case.
> > *
> > @@ -768,6 +795,18 @@ static bool page_pool_recycle_in_cache(netmem_ref netmem,
> > return false;
> > }
> >
> > +#ifdef CONFIG_DEBUG_PAGE_POOL_CACHE_RELEASE
> > + if (unlikely(!page_pool_napi_local(pool))) {
> > + u32 pp_cpuid = READ_ONCE(pool->cpuid);
>
> but then you print pp->cpuid?
>
Point taken. I didn't want to replicate half of page_pool_napi_local()
in the error path. Printing information about the CPU id is also not
really important. The value comes from the stack trace which points to
the code that recycles to the cache from the wrong CPU.
> The patch seems half-baked. If the NAPI local recycling is incorrect
> the pp will leak a reference and live forever. Which hopefully people
> would notice. Are you adding this check just to double confirm that
> any leaks you're chasing are in the driver, and not in the core?
The point is not to chase leaks but races from doing a recycle to cache
from the wrong CPU. This is how XDP issue was caught where
xdp_set_return_frame_no_direct() was not set appropriately for cpumap [1].
My first approach was to __page_pool_put_page() but then I figured that
the warning should live closer to where the actual assignment happens.
[1] https://lore.kernel.org/all/e60404e2-4782-409f-8596-ae21ce7272c4@kernel.org/
Thanks,
Dragos
Powered by blists - more mailing lists