netdev - Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHS8izOio0bnLp3+Vzt44NVgoJpmPTJTACGjWvOXvxVqFKPSwQ@mail.gmail.com>
Date: Tue, 13 May 2025 13:06:38 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Dong Chenchen <dongchenchen2@...wei.com>
Cc: hawk@...nel.org, ilias.apalodimas@...aro.org, davem@...emloft.net, 
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, 
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	zhangchangzhong@...wei.com
Subject: Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring

On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@...wei.com> wrote:
>
> Hello,
>
> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
> similar to syzbot+204a4382fcb3311f3858@...kaller.appspotmail.com.
>
> root cause is as follow:
>
> page_pool_recycle_in_ring
>   ptr_ring_produce
>     spin_lock(&r->producer_lock);
>     WRITE_ONCE(r->queue[r->producer++], ptr)
>       //recycle last page to pool
>                                 page_pool_release
>                                   page_pool_scrub
>                                     page_pool_empty_ring
>                                       ptr_ring_consume
>                                       page_pool_return_page //release all page
>                                   __page_pool_destroy
>                                      free_percpu(pool->recycle_stats);
>                                      kfree(pool) //free
>
>      spin_unlock(&r->producer_lock); //pool->ring uaf read
>   recycle_stat_inc(pool, ring);
>
> page_pool can be free while page pool recycle the last page in ring.
> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
> reproduce this issue with a high probability. Maybe we can fix it by
> holding the user_cnt of the page pool during the page recycle process.
>
> Does anyone have a good idea to solve this problem?
>

Ugh. page_pool_release is not supposed to free the page_pool until all
inflight pages have been returned. It detects that there are pending
inflight pages by checking the atomic stats, but in this case it looks
like we've raced checking the atomic stats with another cpu returning
a netmem to the ptr ring (and it updates the stats _after_ it already
returned to the ptr_ring).

My guess here is that page_pool_scrub needs to acquire the
r->producer_lock to make sure there are no other producers returning
netmems to the ptr_ring while it's scrubbing them (and checking after
to make sure there are no inflight netmems).

Can you test this fix? It may need some massaging. I only checked it
builds. I haven't thought through all the possible races yet:

```
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2b76848659418..8654608734773 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)

 static int page_pool_release(struct page_pool *pool)
 {
+       bool in_softirq;
        int inflight;

+
+       /* Acquire producer lock to make sure we don't race with another thread
+        * returning a netmem to the ptr_ring.
+        */
+       in_softirq = page_pool_producer_lock(pool);
        page_pool_scrub(pool);
        inflight = page_pool_inflight(pool, true);
+       page_pool_producer_unlock(pool, in_softirq);
        if (!inflight)
                __page_pool_destroy(pool);
```