[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izPh5Z-CAJpQzDjhLVN5ye=5i1zaDqb2xQOU3QP08f+Y0Q@mail.gmail.com>
Date: Mon, 19 May 2025 12:20:59 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: "dongchenchen (A)" <dongchenchen2@...wei.com>
Cc: hawk@...nel.org, ilias.apalodimas@...aro.org, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, horms@...nel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
zhangchangzhong@...wei.com
Subject: Re: [BUG Report] KASAN: slab-use-after-free in page_pool_recycle_in_ring
On Tue, May 13, 2025 at 8:11 PM dongchenchen (A)
<dongchenchen2@...wei.com> wrote:
>
>
> > On Tue, May 13, 2025 at 1:28 AM Dong Chenchen <dongchenchen2@...wei.com> wrote:
> >> Hello,
> >>
> >> syzkaller found the UAF issue in page_pool_recycle_in_ring[1], which is
> >> similar to syzbot+204a4382fcb3311f3858@...kaller.appspotmail.com.
> >>
> >> root cause is as follow:
> >>
> >> page_pool_recycle_in_ring
> >> ptr_ring_produce
> >> spin_lock(&r->producer_lock);
> >> WRITE_ONCE(r->queue[r->producer++], ptr)
> >> //recycle last page to pool
> >> page_pool_release
> >> page_pool_scrub
> >> page_pool_empty_ring
> >> ptr_ring_consume
> >> page_pool_return_page //release all page
> >> __page_pool_destroy
> >> free_percpu(pool->recycle_stats);
> >> kfree(pool) //free
> >>
> >> spin_unlock(&r->producer_lock); //pool->ring uaf read
> >> recycle_stat_inc(pool, ring);
> >>
> >> page_pool can be free while page pool recycle the last page in ring.
> >> After adding a delay to the page_pool_recycle_in_ring(), syzlog[2] can
> >> reproduce this issue with a high probability. Maybe we can fix it by
> >> holding the user_cnt of the page pool during the page recycle process.
> >>
> >> Does anyone have a good idea to solve this problem?
> >>
> > Ugh. page_pool_release is not supposed to free the page_pool until all
> > inflight pages have been returned. It detects that there are pending
> > inflight pages by checking the atomic stats, but in this case it looks
> > like we've raced checking the atomic stats with another cpu returning
> > a netmem to the ptr ring (and it updates the stats _after_ it already
> > returned to the ptr_ring).
> >
> > My guess here is that page_pool_scrub needs to acquire the
> > r->producer_lock to make sure there are no other producers returning
> > netmems to the ptr_ring while it's scrubbing them (and checking after
> > to make sure there are no inflight netmems).
> >
> > Can you test this fix? It may need some massaging. I only checked it
> > builds. I haven't thought through all the possible races yet:
> >
> > ```
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 2b76848659418..8654608734773 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -1146,10 +1146,17 @@ static void page_pool_scrub(struct page_pool *pool)
> >
> > static int page_pool_release(struct page_pool *pool)
> > {
> > + bool in_softirq;
> > int inflight;
> >
> > +
> > + /* Acquire producer lock to make sure we don't race with another thread
> > + * returning a netmem to the ptr_ring.
> > + */
> > + in_softirq = page_pool_producer_lock(pool);
> > page_pool_scrub(pool);
> > inflight = page_pool_inflight(pool, true);
> > + page_pool_producer_unlock(pool, in_softirq);
> > if (!inflight)
> > __page_pool_destroy(pool);
> > ```
> Hi, Mina!
>
> I tested this patch and the problem still exists.
> Although this patch ensures that lock access is safe, the page recycle
> process
> can access the page pool after unlock.
>
Sorry for the very late reply; got a bit busy with some work work.
My initial analysis was wrong as the test shows with the candidate
fix. I took another look, and here is what I can tell so far. The full
syzbot report is here for reference:
https://syzkaller.appspot.com/bug?extid=204a4382fcb3311f3858
page_pool_release_retry is supposed to block freeing the page_pool
until all netmems have been freed via page_pool_put_unrefed_netmem
using the inflight logic. What is clear from the syzbot report is that
this inflight logic didn't work properly, because the
page_pool_put_unrefed_netmem call happened after
page_pool_release_retry has allowed the page_pool to be freed
(__page_pool_destroy has already been called).
The inflight logic works by taking the diff between
`pool->pages_state_release_cnt` and `pool->pages_state_hold_cnt`.
pages_state_hold_cnt is incremented when the page_pool allocates a new
page from the buddy allocator. pages_state_hold_cnt is incremented at
the end of the page_pool_put_unrefed_netmem.
We don't expect new pages to be allocated by the page_pool owner after
page_pool_destroy has been called, so pages_state_hold_cnt is supposed
to not move after page_pool_destroy is called I think.
pages_state_release_cnt should be <= pages_state_hold_cnt at the time
of page_pool_destroy is called. Then when all the inflight netmems
have been freed via page_pool_put_unrefed_netmem,
pool->pages_state_release_cnt should be == to
pool->pages_state_hold_cnt, and the page_pool should be allowed to be
freed.
Clearly this is not working, but I can't tell why. I also notice the
syzbot report is from the bpf/test_run.c, but I don't think we have
reports from prod, so it may be a test issue. Some possibilities:
1. Maybe the test is calling a page_pool allocation like
page_pool_dev_alloc_pages in parallel with page_pool_destroy. That may
increment pages_state_hold_cnt unexpectedly?
2. Maybe one of the pages_state_*_cnt overflowed or something?
3. Memory corruption?
I'm afraid I'm not sure. Possibly littering the code with warnings for
unexpected cases would give some insight. For example, I think this
would catch case #1:
```
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4011eb305cee..9fa70c60f9b5 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -536,6 +536,9 @@ static struct page
*__page_pool_alloc_page_order(struct page_pool *pool,
alloc_stat_inc(pool, slow_high_order);
page_pool_set_pp_info(pool, page_to_netmem(page));
+ /* Warn if we're allocating a page on a destroyed page_pool */
+ DEBUG_NET_WARN_ON(pool->destroy_cnt);
+
/* Track how many pages are held 'in-flight' */
pool->pages_state_hold_cnt++;
trace_page_pool_state_hold(pool, page_to_netmem(page),
@@ -582,6 +585,8 @@ static noinline netmem_ref
__page_pool_alloc_pages_slow(struct page_pool *pool,
page_pool_set_pp_info(pool, netmem);
pool->alloc.cache[pool->alloc.count++] = netmem;
/* Track how many pages are held 'in-flight' */
+ /* Warn if we're allocating a page on a destroyed page_pool */
+ DEBUG_NET_WARN_ON(pool->destroy_cnt);
pool->pages_state_hold_cnt++;
trace_page_pool_state_hold(pool, netmem,
pool->pages_state_hold_cnt);
@@ -1271,6 +1276,8 @@ void net_mp_niov_set_page_pool(struct page_pool
*pool, struct net_iov *niov)
page_pool_set_pp_info(pool, netmem);
+ /* Warn if we're allocating a page on a destroyed page_pool */
+ DEBUG_NET_WARN_ON(pool->destroy_cnt);
pool->pages_state_hold_cnt++;
trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt);
```
If you have steps to repro this, maybe post them and I'll try to take
a look when I get a chance if you can't root cause this further.
--
Thanks,
Mina
Powered by blists - more mailing lists