[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29d3e8fa-8cd0-4c93-a685-619758ab5af4@huawei.com>
Date: Thu, 22 May 2025 23:17:32 +0800
From: "dongchenchen (A)" <dongchenchen2@...wei.com>
To: Jakub Kicinski <kuba@...nel.org>, Mina Almasry <almasrymina@...gle.com>
CC: <hawk@...nel.org>, <ilias.apalodimas@...aro.org>, <davem@...emloft.net>,
<edumazet@...gle.com>, <pabeni@...hat.com>, <horms@...nel.org>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<zhangchangzhong@...wei.com>
Subject: Re: [BUG Report] KASAN: slab-use-after-free in
page_pool_recycle_in_ring
> On Mon, 19 May 2025 17:53:08 -0700 Mina Almasry wrote:
>> On Mon, May 19, 2025 at 3:47 PM Jakub Kicinski <kuba@...nel.org> wrote:
>>> On Mon, 19 May 2025 12:20:59 -0700 Mina Almasry wrote:
>>>> Clearly this is not working, but I can't tell why.
>>> I think your fix works but for the one line that collects recycling
>>> stats. If we put recycling stats under the producer lock we should
>>> be safe.
>> What are you referring to as recycle stats? Because I don't think
>> pool->recycle_stats have anything to do with freeing the page_pool.
>>
>> Or do you mean that we should put all the call sites that increment
>> and decrement pool->pages_state_release_cnt and
>> pool->pages_state_hold_cnt under the producer lock?
> No, just the "informational" recycling stats. Looking at what Dong
> Chenchen has shared:
>
> page_pool_recycle_in_ring
> ptr_ring_produce
> spin_lock(&r->producer_lock);
> WRITE_ONCE(r->queue[r->producer++], ptr)
> spin_unlock(&r->producer_lock);
> //recycle last page to pool
> page_pool_release
> page_pool_scrub
> page_pool_empty_ring
> ptr_ring_consume
> page_pool_return_page //release
> all page
> __page_pool_destroy
> free_percpu(pool->recycle_stats);
> kfree(pool) //free
> recycle_stat_inc(pool, ring); //uaf read
>
>
> The thread which put the last page into the ring has released the
> producer lock and now it's trying to increment the recycling stats.
> Which is invalid, the page it put into the right was de facto the
> reference it held. So once it put that page in the ring it should
> no longer touch the page pool.
>
> It's not really related to the refcounting itself, the recycling
> stats don't control the lifetime of the object. With your patch
> to turn the producer lock into a proper barrier, the remaining
> issue feels to me like a basic UAF?
Hi, Jakub
Maybe we can fix the problem as follow:
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2..de3fa33d6775 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -707,19 +707,18 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
{
+ bool in_softirq;
int ret;
- /* BH protection not needed if current is softirq */
- if (in_softirq())
- ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
- else
- ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
- if (!ret) {
+ /* BH protection not needed if current is softirq */
+ in_softirq = page_pool_producer_lock(pool);
+ ret = __ptr_ring_produce(&pool->ring, (__force void *)netmem);
+ if (!ret)
recycle_stat_inc(pool, ring);
- return true;
- }
- return false;
+ page_pool_producer_unlock(pool, in_softirq);
+
+ return ret ? false : true;
}
/* Only allow direct recycling in special circumstances, into the
@@ -1091,10 +1090,16 @@ static void page_pool_scrub(struct page_pool *pool)
static int page_pool_release(struct page_pool *pool)
{
+ bool in_softirq;
int inflight;
+ /* Acquire producer lock to make sure we don't race with another thread
+ * returning a netmem to the ptr_ring.
+ */
+ in_softirq = page_pool_producer_lock(pool);
page_pool_scrub(pool);
inflight = page_pool_inflight(pool, true);
+ page_pool_producer_unlock(pool, in_softirq);
if (!inflight)
__page_pool_destroy(pool);
I have tested this patch.
-----
Best Regards,
Dong Chenchen
Powered by blists - more mailing lists