[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160126152846.GO3628@mtj.duckdns.org>
Date: Tue, 26 Jan 2016 10:28:46 -0500
From: Tejun Heo <tj@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Christian Borntraeger <borntraeger@...ibm.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
"linux-kernel@...r.kernel.org >> Linux Kernel Mailing List"
<linux-kernel@...r.kernel.org>,
linux-s390 <linux-s390@...r.kernel.org>,
KVM list <kvm@...r.kernel.org>, Oleg Nesterov <oleg@...hat.com>
Subject: Re: regression 4.4: deadlock in with cgroup percpu_rwsem
Hello, Christoph.
On Tue, Jan 26, 2016 at 03:51:57PM +0100, Christoph Hellwig wrote:
> > That's interesting. Can you please elaborate on how kill and exit
> > interact to make things complex?
>
> That we need to first call kill to tear down the reference, then we get
> a release callback which is in the calling context of the last
> percpu_ref_put, but will need to call percpu_ref_exit from process context
> again. This means if any percpu_ref_put is from non-process context
Hmmm... why do you need to call percpu_ref_exit() from process
context? All it does is freeing the percpu counter and resetting the
state, both of which can be done from any context.
> we will always need a work_struct or similar to schedule the final
> percpu_ref_exit. Except when..
I don't think that's true.
> > > be a percpu_ref_exit_sync that kills the ref and waits for all references
> > > to go away synchronously.
> >
> > That shouldn't be difficult to implement. One minor concern is that
> > it's almost guaranteed that there will be cases where the
> > synchronicity is exposed to userland. Anyways, can you please
> > describe the use case?
>
> We use this completion scheme where the percpu_ref_exit is done from
> the same context as the percpu_ref_kill which previously waits for
> the last reference drop. But for these cases exposing the synchronicity
> to the caller (including userland) actually is intentional.
>
> My use case is a new storage target, broadly similar to the SCSI target,
> which happens to exhibit the same behavior. In that case we only want
> to return from the teardown function when all I/O on a 'queue' of sorts
> has finished, for example during module removal.
It'd most likely end up doing synchronous destruction in a loop with
each iteration involving a full RCU grace period. If there can be a
lot of devices, it can add up to a substantial amount of time. Maybe
it's okay here but I've already been bitten several times by the exact
same issue.
Thanks.
--
tejun
Powered by blists - more mailing lists