netdev - Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <29d3e8fa-8cd0-4c93-a685-619758ab5af4@huawei.com>
Date: Thu, 22 May 2025 23:17:32 +0800
From: "dongchenchen (A)" <dongchenchen2@...wei.com>
To: Jakub Kicinski <kuba@...nel.org>, Mina Almasry <almasrymina@...gle.com>
CC: <hawk@...nel.org>, <ilias.apalodimas@...aro.org>, <davem@...emloft.net>,
	<edumazet@...gle.com>, <pabeni@...hat.com>, <horms@...nel.org>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<zhangchangzhong@...wei.com>
Subject: Re: [BUG Report] KASAN: slab-use-after-free in
 page_pool_recycle_in_ring


> On Mon, 19 May 2025 17:53:08 -0700 Mina Almasry wrote:
>> On Mon, May 19, 2025 at 3:47 PM Jakub Kicinski <kuba@...nel.org> wrote:
>>> On Mon, 19 May 2025 12:20:59 -0700 Mina Almasry wrote:
>>>> Clearly this is not working, but I can't tell why.
>>> I think your fix works but for the one line that collects recycling
>>> stats. If we put recycling stats under the producer lock we should
>>> be safe.
>> What are you referring to as recycle stats? Because I don't think
>> pool->recycle_stats have anything to do with freeing the page_pool.
>>
>> Or do you mean that we should put all the call sites that increment
>> and decrement pool->pages_state_release_cnt and
>> pool->pages_state_hold_cnt under the producer lock?
> No, just the "informational" recycling stats. Looking at what Dong
> Chenchen has shared:
>
> page_pool_recycle_in_ring
>     ptr_ring_produce
>       spin_lock(&r->producer_lock);
>       WRITE_ONCE(r->queue[r->producer++], ptr)
>       spin_unlock(&r->producer_lock);
>         //recycle last page to pool
>                                   page_pool_release
>                                     page_pool_scrub
>                                       page_pool_empty_ring
>                                         ptr_ring_consume
>                                         page_pool_return_page //release
> all page
>                                     __page_pool_destroy
> free_percpu(pool->recycle_stats);
>                                        kfree(pool) //free
>     recycle_stat_inc(pool, ring); //uaf read
>
>
> The thread which put the last page into the ring has released the
> producer lock and now it's trying to increment the recycling stats.
> Which is invalid, the page it put into the right was de facto the
> reference it held. So once it put that page in the ring it should
> no longer touch the page pool.
>
> It's not really related to the refcounting itself, the recycling
> stats don't control the lifetime of the object. With your patch
> to turn the producer lock into a proper barrier, the remaining
> issue feels to me like a basic UAF?

Hi, Jakub
Maybe we can fix the problem as follow:

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2..de3fa33d6775 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -707,19 +707,18 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
  
  static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
  {
+	bool in_softirq;
  	int ret;
-	/* BH protection not needed if current is softirq */
-	if (in_softirq())
-		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
-	else
-		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
  
-	if (!ret) {
+	/* BH protection not needed if current is softirq */
+	in_softirq = page_pool_producer_lock(pool);
+	ret = __ptr_ring_produce(&pool->ring, (__force void *)netmem);
+	if (!ret)
  		recycle_stat_inc(pool, ring);
-		return true;
-	}
  
-	return false;
+	page_pool_producer_unlock(pool, in_softirq);
+
+	return ret ? false : true;
  }
  
  /* Only allow direct recycling in special circumstances, into the

@@ -1091,10 +1090,16 @@ static void page_pool_scrub(struct page_pool *pool)
  
  static int page_pool_release(struct page_pool *pool)
  {
+	bool in_softirq;
  	int inflight;
  
+	/* Acquire producer lock to make sure we don't race with another thread
+	 * returning a netmem to the ptr_ring.
+	 */
+	in_softirq = page_pool_producer_lock(pool);
  	page_pool_scrub(pool);
  	inflight = page_pool_inflight(pool, true);
+	page_pool_producer_unlock(pool, in_softirq);
  	if (!inflight)
  		__page_pool_destroy(pool);
  
I have tested this patch.
-----
Best Regards,
Dong Chenchen