linux-kernel - Re: regression 4.4: deadlock in with cgroup percpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160126145157.GA31177@lst.de>
Date:	Tue, 26 Jan 2016 15:51:57 +0100
From:	Christoph Hellwig <hch@....de>
To:	Tejun Heo <tj@...nel.org>
Cc:	Christoph Hellwig <hch@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Christian Borntraeger <borntraeger@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	"linux-kernel@...r.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@...r.kernel.org>,
	linux-s390 <linux-s390@...r.kernel.org>,
	KVM list <kvm@...r.kernel.org>, Oleg Nesterov <oleg@...hat.com>
Subject: Re: regression 4.4: deadlock in with cgroup percpu_rwsem

On Mon, Jan 25, 2016 at 02:38:36PM -0500, Tejun Heo wrote:
> On Mon, Jan 25, 2016 at 09:49:42AM +0100, Christoph Hellwig wrote:
> > FYI, my use case was also related to percpu-ref.  The percpu ref API
> > is unfortunately really hard to use and will almost always involve
> > a work queue due to the complex interaction between percpu_ref_kill
> > and percpu_ref_exit.  One thing that would help a lot of callers would
> 
> That's interesting.  Can you please elaborate on how kill and exit
> interact to make things complex?

That we need to first call kill to tear down the reference, then we get
a release callback which is in the calling context of the last
percpu_ref_put, but will need to call percpu_ref_exit from process context
again.  This means if any percpu_ref_put is from non-process context
we will always need a work_struct or similar to schedule the final
percpu_ref_exit.  Except when..

> > be a percpu_ref_exit_sync that kills the ref and waits for all references
> > to go away synchronously.
> 
> That shouldn't be difficult to implement.  One minor concern is that
> it's almost guaranteed that there will be cases where the
> synchronicity is exposed to userland.  Anyways, can you please
> describe the use case?

We use this completion scheme where the percpu_ref_exit is done from
the same context as the percpu_ref_kill which previously waits for
the last reference drop.  But for these cases exposing the synchronicity
to the caller (including userland) actually is intentional.

My use case is a new storage target, broadly similar to the SCSI target,
which happens to exhibit the same behavior.  In that case we only want
to return from the teardown function when all I/O on a 'queue' of sorts
has finished, for example during module removal.