[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <34f06847-f0d8-4ff3-b8a1-0b1484e27ba8@huawei.com>
Date: Wed, 14 May 2025 11:10:59 +0800
From: "dongchenchen (A)" <dongchenchen2@...wei.com>
To: Mina Almasry <almasrymina@...gle.com>
CC: <hawk@...nel.org>, <ilias.apalodimas@...aro.org>, <davem@...emloft.net>,
<edumazet@...gle.com>, <kuba@...nel.org>, <pabeni@...hat.com>,
<horms@...nel.org>, <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<zhangchangzhong@...wei.com>
Subject: Re: [BUG Report] KASAN: slab-use-after-free in
page_pool_recycle_in_ring
> On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@...wei.com> wrote:
>> Hello,
>>
>> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
>> similar to syzbot+204a4382fcb3311f3858@...kaller.appspotmail.com.
>>
>> root cause is as follow:
>>
>> page_pool_recycle_in_ring
>> ptr_ring_produce
>> spin_lock(&r->producer_lock);
>> WRITE_ONCE(r->queue[r->producer++], ptr)
>> //recycle last page to pool
>> page_pool_release
>> page_pool_scrub
>> page_pool_empty_ring
>> ptr_ring_consume
>> page_pool_return_page //release all page
>> __page_pool_destroy
>> free_percpu(pool->recycle_stats);
>> kfree(pool) //free
>>
>> spin_unlock(&r->producer_lock); //pool->ring uaf read
>> recycle_stat_inc(pool, ring);
>>
>> page_pool can be free while page pool recycle the last page in ring.
>> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
>> reproduce this issue with a high probability. Maybe we can fix it by
>> holding the user_cnt of the page pool during the page recycle process.
>>
>> Does anyone have a good idea to solve this problem?
>>
> Ugh. page_pool_release is not supposed to free the page_pool until all
> inflight pages have been returned. It detects that there are pending
> inflight pages by checking the atomic stats, but in this case it looks
> like we've raced checking the atomic stats with another cpu returning
> a netmem to the ptr ring (and it updates the stats _after_ it already
> returned to the ptr_ring).
>
> My guess here is that page_pool_scrub needs to acquire the
> r->producer_lock to make sure there are no other producers returning
> netmems to the ptr_ring while it's scrubbing them (and checking after
> to make sure there are no inflight netmems).
>
> Can you test this fix? It may need some massaging. I only checked it
> builds. I haven't thought through all the possible races yet:
>
> ```
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 2b76848659418..8654608734773 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)
>
> static int page_pool_release(struct page_pool *pool)
> {
> + bool in_softirq;
> int inflight;
>
> +
> + /* Acquire producer lock to make sure we don't race with another thread
> + * returning a netmem to the ptr_ring.
> + */
> + in_softirq = page_pool_producer_lock(pool);
> page_pool_scrub(pool);
> inflight = page_pool_inflight(pool, true);
> + page_pool_producer_unlock(pool, in_softirq);
> if (!inflight)
> __page_pool_destroy(pool);
> ```
Hi, Mina!
I tested this patch and the problem still exists.
Although this patch ensures that lock access is safe, the page recycle
process
can access the page pool after unlock.
page_pool_recycle_in_ring
ptr_ring_produce
spin_lock(&r->producer_lock);
WRITE_ONCE(r->queue[r->producer++], ptr)
spin_unlock(&r->producer_lock);
//recycle last page to pool
page_pool_release
page_pool_scrub
page_pool_empty_ring
ptr_ring_consume
page_pool_return_page //release
all page
__page_pool_destroy
free_percpu(pool->recycle_stats);
kfree(pool) //free
recycle_stat_inc(pool, ring); //uaf read
Regards,
Dong Chenchen
Powered by blists - more mailing lists