[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izOio0bnLp3+Vzt44NVgoJpmPTJTACGjWvOXvxVqFKPSwQ@mail.gmail.com>
Date: Tue, 13 May 2025 13:06:38 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Dong Chenchen <dongchenchen2@...wei.com>
Cc: hawk@...nel.org, ilias.apalodimas@...aro.org, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
zhangchangzhong@...wei.com
Subject: Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@...wei.com> wrote:
>
> Hello,
>
> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
> similar to syzbot+204a4382fcb3311f3858@...kaller.appspotmail.com.
>
> root cause is as follow:
>
> page_pool_recycle_in_ring
> ptr_ring_produce
> spin_lock(&r->producer_lock);
> WRITE_ONCE(r->queue[r->producer++], ptr)
> //recycle last page to pool
> page_pool_release
> page_pool_scrub
> page_pool_empty_ring
> ptr_ring_consume
> page_pool_return_page //release all page
> __page_pool_destroy
> free_percpu(pool->recycle_stats);
> kfree(pool) //free
>
> spin_unlock(&r->producer_lock); //pool->ring uaf read
> recycle_stat_inc(pool, ring);
>
> page_pool can be free while page pool recycle the last page in ring.
> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
> reproduce this issue with a high probability. Maybe we can fix it by
> holding the user_cnt of the page pool during the page recycle process.
>
> Does anyone have a good idea to solve this problem?
>
Ugh. page_pool_release is not supposed to free the page_pool until all
inflight pages have been returned. It detects that there are pending
inflight pages by checking the atomic stats, but in this case it looks
like we've raced checking the atomic stats with another cpu returning
a netmem to the ptr ring (and it updates the stats _after_ it already
returned to the ptr_ring).
My guess here is that page_pool_scrub needs to acquire the
r->producer_lock to make sure there are no other producers returning
netmems to the ptr_ring while it's scrubbing them (and checking after
to make sure there are no inflight netmems).
Can you test this fix? It may need some massaging. I only checked it
builds. I haven't thought through all the possible races yet:
```
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2b76848659418..8654608734773 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)
static int page_pool_release(struct page_pool *pool)
{
+ bool in_softirq;
int inflight;
+
+ /* Acquire producer lock to make sure we don't race with another thread
+ * returning a netmem to the ptr_ring.
+ */
+ in_softirq = page_pool_producer_lock(pool);
page_pool_scrub(pool);
inflight = page_pool_inflight(pool, true);
+ page_pool_producer_unlock(pool, in_softirq);
if (!inflight)
__page_pool_destroy(pool);
```
Powered by blists - more mailing lists