linux-kernel - Re: regression 4.4: deadlock in with cgroup percpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160126152846.GO3628@mtj.duckdns.org>
Date:	Tue, 26 Jan 2016 10:28:46 -0500
From:	Tejun Heo <tj@...nel.org>
To:	Christoph Hellwig <hch@....de>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Christian Borntraeger <borntraeger@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	"linux-kernel@...r.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@...r.kernel.org>,
	linux-s390 <linux-s390@...r.kernel.org>,
	KVM list <kvm@...r.kernel.org>, Oleg Nesterov <oleg@...hat.com>
Subject: Re: regression 4.4: deadlock in with cgroup percpu_rwsem

Hello, Christoph.

On Tue, Jan 26, 2016 at 03:51:57PM +0100, Christoph Hellwig wrote:
> > That's interesting.  Can you please elaborate on how kill and exit
> > interact to make things complex?
> 
> That we need to first call kill to tear down the reference, then we get
> a release callback which is in the calling context of the last
> percpu_ref_put, but will need to call percpu_ref_exit from process context
> again.  This means if any percpu_ref_put is from non-process context

Hmmm... why do you need to call percpu_ref_exit() from process
context?  All it does is freeing the percpu counter and resetting the
state, both of which can be done from any context.

> we will always need a work_struct or similar to schedule the final
> percpu_ref_exit.  Except when..

I don't think that's true.

> > > be a percpu_ref_exit_sync that kills the ref and waits for all references
> > > to go away synchronously.
> > 
> > That shouldn't be difficult to implement.  One minor concern is that
> > it's almost guaranteed that there will be cases where the
> > synchronicity is exposed to userland.  Anyways, can you please
> > describe the use case?
> 
> We use this completion scheme where the percpu_ref_exit is done from
> the same context as the percpu_ref_kill which previously waits for
> the last reference drop.  But for these cases exposing the synchronicity
> to the caller (including userland) actually is intentional.
> 
> My use case is a new storage target, broadly similar to the SCSI target,
> which happens to exhibit the same behavior.  In that case we only want
> to return from the teardown function when all I/O on a 'queue' of sorts
> has finished, for example during module removal.

It'd most likely end up doing synchronous destruction in a loop with
each iteration involving a full RCU grace period.  If there can be a
lot of devices, it can add up to a substantial amount of time.  Maybe
it's okay here but I've already been bitten several times by the exact
same issue.

Thanks.

-- 
tejun