linux-kernel - Re: possible deadlock in __ata_sff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <f86f666f-25f0-a742-b87e-bcf7ccbcca1e@redhat.com>
Date:   Fri, 16 Dec 2022 23:41:11 -0500
From:   Waiman Long <longman@...hat.com>
To:     Al Viro <viro@...iv.linux.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Boqun Feng <boqun.feng@...il.com>,
        Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        Wei Chen <harperchen1110@...il.com>, linux-ide@...r.kernel.org,
        linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com,
        syzbot <syzkaller@...glegroups.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Chuck Lever <chuck.lever@...cle.com>,
        Jeff Layton <jlayton@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: possible deadlock in __ata_sff_interrupt

On 12/16/22 22:05, Al Viro wrote:
> On Fri, Dec 16, 2022 at 08:31:54PM -0600, Linus Torvalds wrote:
>> Ok, let's bring in Waiman for the rwlock side.
>>
>> On Fri, Dec 16, 2022 at 5:54 PM Boqun Feng <boqun.feng@...il.com> wrote:
>>> Right, for a reader not in_interrupt(), it may be blocked by a random
>>> waiting writer because of the fairness, even the lock is currently held
>>> by a reader:
>>>
>>>          CPU 1                   CPU 2           CPU 3
>>>          read_lock(&tasklist_lock); // get the lock
>>>
>>>                                                  write_lock_irq(&tasklist_lock); // wait for the lock
>>>
>>>                                  read_lock(&tasklist_lock); // cannot get the lock because of the fairness
>> But this should be ok - because CPU1 can make progress and eventually
>> release the lock.
>>
>> So the tasklist_lock use is fine on its own - the reason interrupts
>> are special is because an interrupt on CPU 1 taking the lock for
>> reading would deadlock otherwise. As long as it happens on another
>> CPU, the original CPU should then be able to make progress.
>>
>> But the problem here seems to be thst *another* lock is also involved
>> (in this case apparently "host->lock", and now if CPU1 and CPU2 get
>> these two locks in a different order, you can get an ABBA deadlock.
>>
>> And apparently our lockdep machinery doesn't catch that issue, so it
>> doesn't get flagged.
> Lockdep has actually caught that; the locks involved are mention in the
> report (https://marc.info/?l=linux-ide&m=167094379710177&w=2).  The form
> of report might have been better, but if anything, it doesn't mention
> potential involvement of tasklist_lock writer, turning that into a deadlock.
>
> OTOH, that's more or less implicit for the entire class:
>
> read_lock(A)		[non-interrupt]
> 			local_irq_disable()	local_irq_disable()
> 			spin_lock(B)		write_lock(A)
> 			read_lock(A)
> [in interrupt]
> spin_lock(B)
>
> is what that sort of reports is about.  In this case A is tasklist_lock,
> B is host->lock.  Possible call chains for CPU1 and CPU2 are reported...
>
> I wonder why analogues of that hadn't been reported for other SCSI hosts -
> it's a really common pattern there...
>
>> I'm not sure what the lockdep rules for rwlocks are, but maybe lockdep
>> treats rwlocks as being _always_ unfair, not knowing about that "it's
>> only unfair when it's in interrupt context".
>>
>> Maybe we need to always make rwlock unfair? Possibly only for tasklist_lock?
That may not be a good idea as the cacheline bouncing problem will be 
back with reduced performance.
> ISTR threads about the possibility of explicit read_lock_unfair()...

Another possible alternative is to treat the read_lock as unfair if 
interrupt has been disabled as I think we should reduce the interrupt 
disabled interval as much as possible.

Thought?

Cheers,
Longman