lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izPY9BYWzAVR9LNdSP4+-0TsgOoMXvD658i22VFWHZfvfA@mail.gmail.com>
Date: Mon, 26 May 2025 10:51:26 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: "dongchenchen (A)" <dongchenchen2@...wei.com>
Cc: Yunsheng Lin <linyunsheng@...wei.com>, hawk@...nel.org, ilias.apalodimas@...aro.org, 
	davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, 
	horms@...nel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	zhangchangzhong@...wei.com, 
	syzbot+204a4382fcb3311f3858@...kaller.appspotmail.com
Subject: Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring

)

On Mon, May 26, 2025 at 7:47 AM dongchenchen (A)
<dongchenchen2@...wei.com> wrote:
>
>
> > On Fri, May 23, 2025 at 1:31 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> >> On 2025/5/23 14:45, Dong Chenchen wrote:
> >>
> >>>   static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
> >>>   {
> >>> +     bool in_softirq;
> >>>        int ret;
> >> int -> bool?
> >>
> >>>        /* BH protection not needed if current is softirq */
> >>> -     if (in_softirq())
> >>> -             ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >>> -     else
> >>> -             ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
> >>> -
> >>> -     if (!ret) {
> >>> +     in_softirq = page_pool_producer_lock(pool);
> >>> +     ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >>> +     if (ret)
> >>>                recycle_stat_inc(pool, ring);
> >>> -             return true;
> >>> -     }
> >>> +     page_pool_producer_unlock(pool, in_softirq);
> >>>
> >>> -     return false;
> >>> +     return ret;
> >>>   }
> >>>
> >>>   /* Only allow direct recycling in special circumstances, into the
> >>> @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
> >>>
> >>>   static int page_pool_release(struct page_pool *pool)
> >>>   {
> >>> +     bool in_softirq;
> >>>        int inflight;
> >>>
> >>>        page_pool_scrub(pool);
> >>>        inflight = page_pool_inflight(pool, true);
> >>> +     /* Acquire producer lock to make sure producers have exited. */
> >>> +     in_softirq = page_pool_producer_lock(pool);
> >>> +     page_pool_producer_unlock(pool, in_softirq);
> >> Is a compiler barrier needed to ensure compiler doesn't optimize away
> >> the above code?
> >>
> > I don't want to derail this conversation too much, and I suggested a
> > similar fix to this initially, but now I'm not sure I understand why
> > it works.
> >
> > Why is the existing barrier not working and acquiring/releasing the
> > producer lock fixes this issue instead? The existing barrier is the
> > producer thread incrementing pool->pages_state_release_cnt, and
> > page_pool_release() is supposed to block the freeing of the page_pool
> > until it sees the
> > `atomic_inc_return_relaxed(&pool->pages_state_release_cnt);` from the
> > producer thread. Any idea why this barrier is not working? AFAIU it
> > should do the exact same thing as acquiring/dropping the producer
> > lock.
>
> Hi, Mina
> As previously mentioned:
> page_pool_recycle_in_ring
>    ptr_ring_produce
>      spin_lock(&r->producer_lock);
>      WRITE_ONCE(r->queue[r->producer++], ptr)
>        //recycle last page to pool, producer + release_cnt = hold_cnt

This is not right. release_cnt != hold_cnt at this point.

Release_cnt is only incremented by the producer _after_ the
spin_unlock and the recycle_stat_inc have been done. The full call
stack on the producer thread:

page_pool_put_unrefed_netmem
    page_pool_recycle_in_ring
        ptr_ring_produce(&pool->ring, (__force void *)netmem);
             spin_lock(&r->producer_lock);
             __ptr_ring_produce(r, ptr);
             spin_unlock(&r->producer_lock);
        recycle_stat_inc(pool, ring);
    recycle_stat_inc(pool, ring_full);
    page_pool_return_page
        atomic_inc_return_relaxed(&pool->pages_state_release_cnt);

The atomic_inc_return_relaxed happens after all the lines that could
cause UAF are already executed. Is it because we're using the _relaxed
version of the atomic operation, that the compiler can reorder it to
happen before the spin_unlock(&r->producer_lock) and before the
recycle_stat_inc...?

-- 
Thanks,
Mina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ