linux-kernel - Re: possible deadlock in __ata_sff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y5vo00v2F4zVKeug@ZenIV>
Date:   Fri, 16 Dec 2022 03:41:07 +0000
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>
Cc:     Wei Chen <harperchen1110@...il.com>, linux-ide@...r.kernel.org,
        linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com,
        syzbot <syzkaller@...glegroups.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Chuck Lever <chuck.lever@...cle.com>,
        Jeff Layton <jlayton@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: possible deadlock in __ata_sff_interrupt

On Fri, Dec 16, 2022 at 10:44:06AM +0900, Damien Le Moal wrote:

> The original & complete lockdep splat is in the report email here:
> 
> https://marc.info/?l=linux-ide&m=167094379710177&w=2
> 
> It looks like a spinlock is taken for the fasync stuff without irq
> disabled and that same spinlock is needed in kill_fasync() which is
> itself called (potentially) with IRQ disabled. Hence the splat. In any
> case, that is how I understand the issue. But as mentioned above, given
> that I can see many drivers calling kill_fasync() with irq disabled, I
> wonder if this is a genuine potential problem or a false negative.

OK, I'm about to fall asleep, so I might very well be missing something
obvious, but...

CPU1: ptrace(2)
	ptrace_check_attach()
		read_lock(&tasklist_lock);

CPU2: setpgid(2)
        write_lock_irq(&tasklist_lock);
	spins

CPU1: takes an interrupt that would call kill_fasync().  grep and the
first instance of kill_fasync() is in hpet_interrupt() - it's not
something exotic.  IRQs disabled on CPU2 won't stop it.
	kill_fasync(..., SIGIO, ...)
		kill_fasync_rcu()
			read_lock_irqsave(&fa->fa_lock, flags);
			send_sigio()
			        read_lock_irqsave(&fown->lock, flags);
		                read_lock(&tasklist_lock);

... and CPU1 spins as well.

It's not a matter of kill_fasync() called with IRQs disabled; the
problem is kill_fasync() called from interrupt taken while holding
tasklist_lock at least shared.  Somebody trying to grab it on another
CPU exclusive before we get to send_sigio() from kill_fasync() will
end up spinning and will make us spin as well.

I really hope that's just me not seeing something obvious - we had
kill_fasync() called in IRQ handlers since way back and we had
tasklist_lock taken shared without disabling IRQs for just as long.

<goes to sleep, hoping to find "Al, you are a moron, it's obviously OK
for such and such reasons" in the mailbox tomorrow morning>