netdev - Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <34f06847-f0d8-4ff3-b8a1-0b1484e27ba8@huawei.com>
Date: Wed, 14 May 2025 11:10:59 +0800
From: "dongchenchen (A)" <dongchenchen2@...wei.com>
To: Mina Almasry <almasrymina@...gle.com>
CC: <hawk@...nel.org>, <ilias.apalodimas@...aro.org>, <davem@...emloft.net>,
	<edumazet@...gle.com>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<horms@...nel.org>, <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<zhangchangzhong@...wei.com>
Subject: Re: [BUG Report] KASAN: slab-use-after-free in
 page_pool_recycle_in_ring


> On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@...wei.com> wrote:
>> Hello,
>>
>> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
>> similar to syzbot+204a4382fcb3311f3858@...kaller.appspotmail.com.
>>
>> root cause is as follow:
>>
>> page_pool_recycle_in_ring
>>    ptr_ring_produce
>>      spin_lock(&r->producer_lock);
>>      WRITE_ONCE(r->queue[r->producer++], ptr)
>>        //recycle last page to pool
>>                                  page_pool_release
>>                                    page_pool_scrub
>>                                      page_pool_empty_ring
>>                                        ptr_ring_consume
>>                                        page_pool_return_page //release all page
>>                                    __page_pool_destroy
>>                                       free_percpu(pool->recycle_stats);
>>                                       kfree(pool) //free
>>
>>       spin_unlock(&r->producer_lock); //pool->ring uaf read
>>    recycle_stat_inc(pool, ring);
>>
>> page_pool can be free while page pool recycle the last page in ring.
>> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
>> reproduce this issue with a high probability. Maybe we can fix it by
>> holding the user_cnt of the page pool during the page recycle process.
>>
>> Does anyone have a good idea to solve this problem?
>>
> Ugh. page_pool_release is not supposed to free the page_pool until all
> inflight pages have been returned. It detects that there are pending
> inflight pages by checking the atomic stats, but in this case it looks
> like we've raced checking the atomic stats with another cpu returning
> a netmem to the ptr ring (and it updates the stats _after_ it already
> returned to the ptr_ring).
>
> My guess here is that page_pool_scrub needs to acquire the
> r->producer_lock to make sure there are no other producers returning
> netmems to the ptr_ring while it's scrubbing them (and checking after
> to make sure there are no inflight netmems).
>
> Can you test this fix? It may need some massaging. I only checked it
> builds. I haven't thought through all the possible races yet:
>
> ```
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 2b76848659418..8654608734773 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)
>
>   static int page_pool_release(struct page_pool *pool)
>   {
> +       bool in_softirq;
>          int inflight;
>
> +
> +       /* Acquire producer lock to make sure we don't race with another thread
> +        * returning a netmem to the ptr_ring.
> +        */
> +       in_softirq = page_pool_producer_lock(pool);
>          page_pool_scrub(pool);
>          inflight = page_pool_inflight(pool, true);
> +       page_pool_producer_unlock(pool, in_softirq);
>          if (!inflight)
>                  __page_pool_destroy(pool);
> ```
Hi, Mina!

I tested this patch and the problem still exists.
Although this patch ensures that lock access is safe, the page recycle 
process
can access the page pool after unlock.

page_pool_recycle_in_ring
   ptr_ring_produce
     spin_lock(&r->producer_lock);
     WRITE_ONCE(r->queue[r->producer++], ptr)
     spin_unlock(&r->producer_lock);
       //recycle last page to pool
                                 page_pool_release
                                   page_pool_scrub
                                     page_pool_empty_ring
                                       ptr_ring_consume
                                       page_pool_return_page //release 
all page
                                   __page_pool_destroy
free_percpu(pool->recycle_stats);
                                      kfree(pool) //free
   recycle_stat_inc(pool, ring); //uaf read

Regards,
Dong Chenchen