linux-kernel - Re: [syzbot] possible deadlock in io_sq_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <282ce2ab-1429-815c-c11f-e3e9d36ef750@gmail.com>
Date:   Sun, 7 Mar 2021 13:20:53 +0000
From:   Pavel Begunkov <asml.silence@...il.com>
To:     syzbot <syzbot+ac39856cb1b332dbbdda@...kaller.appspotmail.com>,
        axboe@...nel.dk, io-uring@...r.kernel.org,
        linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] possible deadlock in io_sq_thread_finish

On 07/03/2021 12:39, Pavel Begunkov wrote:
> On 07/03/2021 09:49, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    a38fd874 Linux 5.12-rc2
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=143ee02ad00000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2
>> dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+ac39856cb1b332dbbdda@...kaller.appspotmail.com
> 
> Legit error, park() might take an sqd lock, and then we take it again.
> I'll patch it up

I was wrong, it looks fine, io_put_sq_data() and io_sq_thread_park()
don't nest. I wonder if that's a false positive due to conditional
locking as below

if (sqd->thread == current)
	return;
mutex_lock(&sqd->lock);

> 
>>
>> ============================================
>> WARNING: possible recursive locking detected
>> 5.12.0-rc2-syzkaller #0 Not tainted
>> --------------------------------------------
>> kworker/u4:7/7615 is trying to acquire lock:
>> ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_stop fs/io_uring.c:7099 [inline]
>> ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_put_sq_data fs/io_uring.c:7115 [inline]
>> ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139
>>
>> but task is already holding lock:
>> ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park fs/io_uring.c:7088 [inline]
>> ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082
>>
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>>
>>        CPU0
>>        ----
>>   lock(&sqd->lock);
>>   lock(&sqd->lock);
>>
>>  *** DEADLOCK ***
>>
>>  May be due to missing lock nesting notation
>>
>> 3 locks held by kworker/u4:7/7615:
>>  #0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>>  #0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
>>  #0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
>>  #0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
>>  #0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
>>  #0: ffff888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2246
>>  #1: ffffc900023a7da8 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2250
>>  #2: ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park fs/io_uring.c:7088 [inline]
>>  #2: ffff888144a02870 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082
>>
>> stack backtrace:
>> CPU: 1 PID: 7615 Comm: kworker/u4:7 Not tainted 5.12.0-rc2-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Workqueue: events_unbound io_ring_exit_work
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:79 [inline]
>>  dump_stack+0x141/0x1d7 lib/dump_stack.c:120
>>  print_deadlock_bug kernel/locking/lockdep.c:2829 [inline]
>>  check_deadlock kernel/locking/lockdep.c:2872 [inline]
>>  validate_chain kernel/locking/lockdep.c:3661 [inline]
>>  __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900
>>  lock_acquire kernel/locking/lockdep.c:5510 [inline]
>>  lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475
>>  __mutex_lock_common kernel/locking/mutex.c:946 [inline]
>>  __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1093
>>  io_sq_thread_stop fs/io_uring.c:7099 [inline]
>>  io_put_sq_data fs/io_uring.c:7115 [inline]
>>  io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139
>>  io_ring_ctx_free fs/io_uring.c:8408 [inline]
>>  io_ring_exit_work+0x82/0x9a0 fs/io_uring.c:8539
>>  process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
>>  worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
>>  kthread+0x3b1/0x4a0 kernel/kthread.c:292
>>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@...glegroups.com.
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>>
> 

-- 
Pavel Begunkov