lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izOio0bnLp3+Vzt44NVgoJpmPTJTACGjWvOXvxVqFKPSwQ@mail.gmail.com>
Date: Tue, 13 May 2025 13:06:38 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Dong Chenchen <dongchenchen2@...wei.com>
Cc: hawk@...nel.org, ilias.apalodimas@...aro.org, davem@...emloft.net, 
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, 
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	zhangchangzhong@...wei.com
Subject: Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring

On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@...wei.com> wrote:
>
> Hello,
>
> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
> similar to syzbot+204a4382fcb3311f3858@...kaller.appspotmail.com.
>
> root cause is as follow:
>
> page_pool_recycle_in_ring
>   ptr_ring_produce
>     spin_lock(&r->producer_lock);
>     WRITE_ONCE(r->queue[r->producer++], ptr)
>       //recycle last page to pool
>                                 page_pool_release
>                                   page_pool_scrub
>                                     page_pool_empty_ring
>                                       ptr_ring_consume
>                                       page_pool_return_page //release all page
>                                   __page_pool_destroy
>                                      free_percpu(pool->recycle_stats);
>                                      kfree(pool) //free
>
>      spin_unlock(&r->producer_lock); //pool->ring uaf read
>   recycle_stat_inc(pool, ring);
>
> page_pool can be free while page pool recycle the last page in ring.
> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
> reproduce this issue with a high probability. Maybe we can fix it by
> holding the user_cnt of the page pool during the page recycle process.
>
> Does anyone have a good idea to solve this problem?
>

Ugh. page_pool_release is not supposed to free the page_pool until all
inflight pages have been returned. It detects that there are pending
inflight pages by checking the atomic stats, but in this case it looks
like we've raced checking the atomic stats with another cpu returning
a netmem to the ptr ring (and it updates the stats _after_ it already
returned to the ptr_ring).

My guess here is that page_pool_scrub needs to acquire the
r->producer_lock to make sure there are no other producers returning
netmems to the ptr_ring while it's scrubbing them (and checking after
to make sure there are no inflight netmems).

Can you test this fix? It may need some massaging. I only checked it
builds. I haven't thought through all the possible races yet:

```
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2b76848659418..8654608734773 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)

 static int page_pool_release(struct page_pool *pool)
 {
+       bool in_softirq;
        int inflight;

+
+       /* Acquire producer lock to make sure we don't race with another thread
+        * returning a netmem to the ptr_ring.
+        */
+       in_softirq = page_pool_producer_lock(pool);
        page_pool_scrub(pool);
        inflight = page_pool_inflight(pool, true);
+       page_pool_producer_unlock(pool, in_softirq);
        if (!inflight)
                __page_pool_destroy(pool);
```

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ