lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce4e1130-203b-095d-be4e-5b3a05a08ba7@redhat.com>
Date:   Fri, 24 Sep 2021 20:30:20 -0400
From:   Waiman Long <llong@...hat.com>
To:     paulmck@...nel.org, Waiman Long <llong@...hat.com>
Cc:     peterz@...radead.org, mingo@...hat.com, will@...nel.org,
        boqun.feng@...il.com, linux-kernel@...r.kernel.org, richard@....at
Subject: Re: Confusing lockdep splat

On 9/24/21 6:43 PM, Paul E. McKenney wrote:
> On Fri, Sep 24, 2021 at 05:41:17PM -0400, Waiman Long wrote:
>> On 9/24/21 5:02 PM, Paul E. McKenney wrote:
>>> Hello!
>>>
>>> I got the lockdep splat below from an SRCU-T rcutorture run, which uses
>>> a !SMP !PREEMPT kernel.  This is a random event, and about half the time
>>> it happens within an hour or two.  My reproducer (on current -rcu "dev"
>>> branch for a 16-CPU system) is:
>>>
>>> 	tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 16 --configs "16*SRCU-T" --duration 7200
>>>
>>> My points of confusion are as follows:
>>>
>>> 1.	The locks involved in this deadlock cycle are irq-disabled
>>> 	raw spinlocks.	The claimed deadlock cycle uses two CPUs.
>>> 	There is only one CPU.	There is no possibility of preemption
>>> 	or interrupts.	So how can this deadlock actually happen?
>>>
>>> 2.	If there was more than one CPU, then yes, there would be
>>> 	a deadlock.  The PI lock is acquired by the wakeup code after
>>> 	acquiring the workqueue lock, and rcutorture tests the new ability
>>> 	of the scheduler to hold the PI lock across rcu_read_unlock(),
>>> 	and while it is at it, across the rest of the unlock primitives.
>>>
>>> 	But if there was more than one CPU, Tree SRCU would be used
>>> 	instead of Tiny SRCU, and there would be no wakeup invoked from
>>> 	srcu_read_unlock().
>>>
>>> 	Given only one CPU, there is no way to complete the deadlock
>>> 	cycle.
>>>
>>> For now, I am working around this by preventing rcutorture from holding
>>> the PI lock across Tiny srcu_read_unlock().
>>>
>>> Am I missing something subtle here?
>> I would say that the lockdep code just doesn't have enough intelligence to
>> identify that deadlock is not possible in this special case. There are
>> certainly false positives, and it can be hard to get rid of them.
> Would it make sense for lockdep to filter out reports involving more
> than one CPU unless there is at least one sleeplock in the cycle?
>
> Of course, it gets more complicated when interrupts are involved...

Actually, lockdep keeps track of all the possible lock orderings and put 
out a splat whenever these lock orderings suggest that a circular 
deadlock is possible. It doesn't keep track if a lock is sleepable or 
not. Also lockdep deals with lock classes each of which can have many 
instances. So all the pi_lock's in different task_struct's are all 
treated as the same lock from lockdep's perspective. We can't treat all 
different instances separately or we will run out of lockdep table space 
very quickly.

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ