linux-kernel - Re: INFO: possible circular locking dependency at cleanup_workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 19 May 2009 12:49:14 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Johannes Berg <johannes@...solutions.net>,
	Ingo Molnar <mingo@...e.hu>,
	Zdenek Kabelac <zdenek.kabelac@...il.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Gautham R Shenoy <ego@...ibm.com>
Subject: Re: INFO: possible circular locking dependency at
 cleanup_workqueue_thread

On Tue, 2009-05-19 at 11:13 +0200, Peter Zijlstra wrote:
> On Tue, 2009-05-19 at 00:14 +0200, Oleg Nesterov wrote:
> > On 05/18, Peter Zijlstra wrote:
> > >
> > > On Mon, 2009-05-18 at 22:16 +0200, Oleg Nesterov wrote:
> > > > On 05/18, Peter Zijlstra wrote:
> > > > >
> > > > > On Mon, 2009-05-18 at 21:47 +0200, Oleg Nesterov wrote:
> > > > > >
> > > > > > This output looks obviously wrong, Z does not depend on L1 or any
> > > > > > other lock.
> > > > >
> > > > > It does, L1 -> L2 -> Z as per 1 and 2
> > > > > which 3 obviously reverses.
> > > >
> > > > Yes, yes, I see. And, as I said, I can't explain what I mean.
> > > >
> > > > I mean... The output above looks as if we take L1 and Z in wrong order.
> > > > But Z has nothing to do with this deadlock, it can't depend on any lock
> > > > from the correctness pov. Except yes, we have it in L1->L2->Z->L1 cycle.
> > >
> > > AB-BC-CA deadlock
> > >
> > > Thread 1		Thread 2		Thread 3
> > >
> > > L(L1)
> > > 			L(L2)
> > > 						L(Z)
> > > L(L2)
> > > 			L(Z)
> > > 						L(L1)
> > 
> > Sure. Now Z really depends on L1. But if you change Thread 3 to take yet
> > another unique lock X under Z, then lockdep will complain that X depends
> > on L1, not Z.
> > 
> > To clarify, I do not say the output is bugggy. I only meant it could be
> > better. But I don't understand how to improve it.
> > 
> > If we return to the original bug report, perhaps cpu_add_remove_lock
> > has nothing to do with this problem... we could have the similar output
> > if device_pm_lock() is called from work_struct.
> > 
> > > And you're saying, we can't have that deadlock because we don't have the
> > > 3 separate functions?
> > 
> > No,
> > 
> > > That is, there is no concurrency on Z because its always taken under L2?
> > 
> > Yes, nobody else can hold Z when we take L2.
> > 
> > But this wasn't my point.
> > 
> > > For those situations we have the spin_lock_nest_lock(X, y) annotation,
> > > where we say, there cannot be any concurrency on x element of X, because
> > > all such locks are always taken under y.
> > 
> > We can just kill L(Z) instead of annotating, this changes nothing from
> > the correctness pov, we have the same deadlock. But the output becomes
> > very clear: L1 depends on L2.
> > 
> > 
> > OK, please forget. Not sure why I started this thread. Just because I
> > was surprised a bit when I figured out that lockdep's output does not
> > match my naive expectations.
> 
> Well, since you're quite versed in the field, I'm guessing other people
> might find it even more confusing -- so it might well do to explore this
> situation a bit further, if only to see if we can make lockdep output
> 'easier'.
> 
> There is a solution to this, Gautham suggested it a while back, we could
> make lockdep scan a lock (Z) his dependencies and if in every chain a
> particular other lock (L2) was taken, ignore this lock (Z) his
> dependency for the circular analysis at hand.
> 
> That would mean we would not find the Z->L1 dep to violate the existing
> one, because we would ignore L2->Z (because in every Z we hold L2), and
> we would indeed fail on the next: L2->L1 on the next line in your
> initial program.
> 
> Implementing this however might be slightly less trivial than this
> explanation -- it would however rid us of the spin_lock_nest_lock()
> annotation's need.

Ingo pointed out that that would weaken the possible deadlock detection
in that it would have to observe a Z outside of L2 before reporting the
problem, which might be a very rare, but existing, code path.

Another possible way might be to find the smallest cycle instead of just
any (the first) cycle.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/