linux-kernel - Re: possible deadlock in send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87pna5si0w.fsf@x220.int.ebiederm.org>
Date:   Thu, 11 Jun 2020 11:07:27 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Waiman Long <longman@...hat.com>
Cc:     syzbot <syzbot+a9fb1457d720a55d6dc5@...kaller.appspotmail.com>,
        adobriyan@...il.com, akpm@...ux-foundation.org,
        allison@...utok.net, areber@...hat.com, aubrey.li@...ux.intel.com,
        avagin@...il.com, bfields@...ldses.org, christian@...uner.io,
        cyphar@...har.com, gregkh@...uxfoundation.org, guro@...com,
        jlayton@...nel.org, joel@...lfernandes.org, keescook@...omium.org,
        linmiaohe@...wei.com, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, mhocko@...e.com, mingo@...nel.org,
        oleg@...hat.com, peterz@...radead.org, sargun@...gun.me,
        syzkaller-bugs@...glegroups.com, tglx@...utronix.de,
        viro@...iv.linux.org.uk
Subject: Re: possible deadlock in send_sigio

Waiman Long <longman@...hat.com> writes:

> On 4/4/20 1:55 AM, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    bef7b2a7 Merge tag 'devicetree-for-5.7' of git://git.kerne..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=15f39c5de00000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=91b674b8f0368e69
>> dashboard link: https://syzkaller.appspot.com/bug?extid=a9fb1457d720a55d6dc5
>> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1454c3b7e00000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12a22ac7e00000
>>
>> The bug was bisected to:
>>
>> commit 7bc3e6e55acf065500a24621f3b313e7e5998acf
>> Author: Eric W. Biederman <ebiederm@...ssion.com>
>> Date:   Thu Feb 20 00:22:26 2020 +0000
>>
>>      proc: Use a list of inodes to flush from proc
>>
>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=165c4acde00000
>> final crash:    https://syzkaller.appspot.com/x/report.txt?x=155c4acde00000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=115c4acde00000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+a9fb1457d720a55d6dc5@...kaller.appspotmail.com
>> Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc")
>>
>> ========================================================
>> WARNING: possible irq lock inversion dependency detected
>> 5.6.0-syzkaller #0 Not tainted
>> --------------------------------------------------------
>> ksoftirqd/0/9 just changed the state of lock:
>> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigio+0xa9/0x340 fs/fcntl.c:800
>> but this lock took another, SOFTIRQ-unsafe lock in the past:
>>   (&pid->wait_pidfd){+.+.}-{2:2}
>>
>>
>> and interrupts could create inverse lock ordering between them.
>>
>>
>> other info that might help us debug this:
>>   Possible interrupt unsafe locking scenario:
>>
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(&pid->wait_pidfd);
>>                                 local_irq_disable();
>>                                 lock(tasklist_lock);
>>                                 lock(&pid->wait_pidfd);
>>    <Interrupt>
>>      lock(tasklist_lock);
>>
>>   *** DEADLOCK ***
>
> That is a false positive. The qrwlock has the special property that it becomes
> unfair (for read lock) at interrupt context. So unless it is taking a write lock
> in the interrupt context, it won't go into deadlock. The current lockdep code
> does not capture the full semantics of qrwlock leading to this false positive.
>

Whatever it was it was fixed with:
63f818f46af9 ("proc: Use a dedicated lock in struct pid")

It is classic lock inversion caused by not disabling irqs.

Unless I am completely mistaken any non-irq code path that does:
	write_lock_irq(&tasklist_lock);
        spin_lock(&pid->lock);

Is susceptible to deadlock with:
	spin_lock(&pid->lock);
        <Interrupt>
        read_lock(&task_list_lock);

Because it remains a lock inversion even with only a read lock taken in
irq context in irq context.

Eric