linux-kernel - Re: [PATCH 3/3] lib/percpu-refcount: drain ->release() in perpcu_ref

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Y5prfOjyyjQKUrtH@T590>
Date:   Thu, 15 Dec 2022 08:34:04 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Dennis Zhou <dennis@...nel.org>
Cc:     Hillf Danton <hdanton@...a.com>, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Zhong Jinghua <zhongjinghua@...wei.com>, ming.lei@...hat.com
Subject: Re: [PATCH 3/3] lib/percpu-refcount: drain ->release() in
 perpcu_ref_exit()

On Wed, Dec 14, 2022 at 08:07:28AM -0800, Dennis Zhou wrote:
> Hello,
> 
> On Wed, Dec 14, 2022 at 09:30:08PM +0800, Ming Lei wrote:
> > On Wed, Dec 14, 2022 at 04:16:51PM +0800, Hillf Danton wrote:
> > > On 14 Dec 2022 10:51:01 +0800 Ming Lei <ming.lei@...hat.com>
> > > > The pattern of wait_event(percpu_ref_is_zero()) has been used in several
> > > 
> > > For example?
> > 
> > blk_mq_freeze_queue_wait() and target_wait_for_sess_cmds().
> > 
> > > 
> > > > kernel components, and this way actually has the following risk:
> > > > 
> > > > - percpu_ref_is_zero() can be returned just between
> > > >   atomic_long_sub_and_test() and ref->data->release(ref)
> > > > 
> > > > - given the refcount is found as zero, percpu_ref_exit() could
> > > >   be called, and the host data structure is freed
> > > > 
> > > > - then use-after-free is triggered in ->release() when the user host
> > > >   data structure is freed after percpu_ref_exit() returns
> > > 
> > > The race between exit and the release callback should be considered at the
> > > corresponding callsite, given the comment below, and closed for instance
> > > by synchronizing rcu.
> > > 
> > > /**
> > >  * percpu_ref_put_many - decrement a percpu refcount
> > >  * @ref: percpu_ref to put
> > >  * @nr: number of references to put
> > >  *
> > >  * Decrement the refcount, and if 0, call the release function (which was passed
> > >  * to percpu_ref_init())
> > >  *
> > >  * This function is safe to call as long as @ref is between init and exit.
> > >  */
> > 
> > Not sure if the above comment implies that the callsite should cover the
> > race.
> > 
> > But blk-mq can really avoid the trouble by using the existed call_rcu():
> > 
> 
> I struggle with the dependency on release(). release() itself should not
> block, but a common pattern would be to through a call_rcu() in and

Yes, release() is called with rcu read lock, and I guess the trouble may
be originated from the fact release() may do nothing related with
actual data releasing.

> schedule additional work - see block/blk-cgroup.c, blkg_release().

I believe the pattern is user specific, and the motivation of using call_rcu
can't be just for avoiding such potential race between release() and
percpu_ref_exit().

> 
> I think the dependency really is the completion of release() and the
> work scheduled on it's behalf rather than strictly starting the
> release() callback. This series doesn't preclude that from happening.

Yeah.

For any additional work or sort of thing scheduled in release(), only
the caller can guarantee they are drained before percpu_exit_ref(), so
I agree now it is better for caller to avoid the race.

> 
> /**
>  * percpu_ref_exit - undo percpu_ref_init()
>  * @ref: percpu_ref to exit
>  *
>  * This function exits @ref.  The caller is responsible for ensuring that
>  * @ref is no longer in active use.  The usual places to invoke this
>  * function from are the @ref->release() callback or in init failure path
>  * where percpu_ref_init() succeeded but other parts of the initialization
>  * of the embedding object failed.
>  */
> 
> I think the percpu_ref_exit() comment explains the more common use case
> approach to percpu refcounts. release() triggering percpu_ref_exit() is
> the ideal case.

But most of callers don't use in this way actually.


Thanks, 
Ming