linux-kernel - Re: [syzbot] [block?] possible deadlock in blk_mq_submit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241123235958.1489-1-hdanton@sina.com>
Date: Sun, 24 Nov 2024 07:59:58 +0800
From: Hillf Danton <hdanton@...a.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: syzbot <syzbot+5218c85078236fc46227@...kaller.appspotmail.com>,
	axboe@...nel.dk,
	linux-block@...r.kernel.org,
	Boqun Feng <boqun.feng@...il.com>,
	linux-kernel@...r.kernel.org,
	syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [block?] possible deadlock in blk_mq_submit_bio

On Sat, 23 Nov 2024 07:37:22 -0800
> syzbot found the following issue on:
> 
> HEAD commit:    06afb0f36106 Merge tag 'trace-v6.13' of git://git.kernel.o..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148bfec0580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b011a14ee4cb9480
> dashboard link: https://syzkaller.appspot.com/bug?extid=5218c85078236fc46227
> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: i386
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-06afb0f3.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/aae0561fd279/vmlinux-06afb0f3.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/faa3af3fa7ce/bzImage-06afb0f3.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+5218c85078236fc46227@...kaller.appspotmail.com
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.12.0-syzkaller-07834-g06afb0f36106 #0 Not tainted
> ------------------------------------------------------
> kswapd0/112 is trying to acquire lock:
> ffff88801f3f1438 (&q->q_usage_counter(io)#68){++++}-{0:0}, at: bio_queue_enter block/blk.h:79 [inline]
> ffff88801f3f1438 (&q->q_usage_counter(io)#68){++++}-{0:0}, at: blk_mq_submit_bio+0x7ca/0x24c0 block/blk-mq.c:3092
> 
> but task is already holding lock:
> ffffffff8df4de60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xcd9/0x18f0 mm/vmscan.c:6976
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (fs_reclaim){+.+.}-{0:0}:
>        __fs_reclaim_acquire mm/page_alloc.c:3851 [inline]
>        fs_reclaim_acquire+0x102/0x150 mm/page_alloc.c:3865
>        might_alloc include/linux/sched/mm.h:318 [inline]
>        slab_pre_alloc_hook mm/slub.c:4036 [inline]
>        slab_alloc_node mm/slub.c:4114 [inline]
>        __do_kmalloc_node mm/slub.c:4263 [inline]
>        __kmalloc_node_noprof+0xb7/0x440 mm/slub.c:4270
>        __kvmalloc_node_noprof+0xad/0x1a0 mm/util.c:658
>        sbitmap_init_node+0x1ca/0x770 lib/sbitmap.c:132
>        scsi_realloc_sdev_budget_map+0x2c7/0x610 drivers/scsi/scsi_scan.c:246
>        scsi_add_lun+0x11b4/0x1fd0 drivers/scsi/scsi_scan.c:1106
>        scsi_probe_and_add_lun+0x4fa/0xda0 drivers/scsi/scsi_scan.c:1287
>        __scsi_add_device+0x24b/0x290 drivers/scsi/scsi_scan.c:1622
>        ata_scsi_scan_host+0x215/0x780 drivers/ata/libata-scsi.c:4575
>        async_run_entry_fn+0x9c/0x530 kernel/async.c:129
>        process_one_work+0x958/0x1b30 kernel/workqueue.c:3229
>        process_scheduled_works kernel/workqueue.c:3310 [inline]
>        worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
>        kthread+0x2c1/0x3a0 kernel/kthread.c:389
>        ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
>        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> 
> -> #0 (&q->q_usage_counter(io)#68){++++}-{0:0}:
>        check_prev_add kernel/locking/lockdep.c:3161 [inline]
>        check_prevs_add kernel/locking/lockdep.c:3280 [inline]
>        validate_chain kernel/locking/lockdep.c:3904 [inline]
>        __lock_acquire+0x249e/0x3c40 kernel/locking/lockdep.c:5226
>        lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5849
>        __bio_queue_enter+0x4c6/0x740 block/blk-core.c:361
>        bio_queue_enter block/blk.h:79 [inline]

Another splat in bio_queue_enter() [1]

[1] https://lore.kernel.org/lkml/20241104112732.3144-1-hdanton@sina.com/

>        blk_mq_submit_bio+0x7ca/0x24c0 block/blk-mq.c:3092
>        __submit_bio+0x384/0x540 block/blk-core.c:629
>        __submit_bio_noacct_mq block/blk-core.c:710 [inline]
>        submit_bio_noacct_nocheck+0x698/0xd70 block/blk-core.c:739
>        submit_bio_noacct+0x93a/0x1e20 block/blk-core.c:868
>        swap_writepage_bdev_async mm/page_io.c:449 [inline]
>        __swap_writepage+0x3a3/0xf50 mm/page_io.c:472
>        swap_writepage+0x403/0x1040 mm/page_io.c:288
>        pageout+0x3b2/0xaa0 mm/vmscan.c:689
>        shrink_folio_list+0x3025/0x42d0 mm/vmscan.c:1367
>        evict_folios+0x6d6/0x1970 mm/vmscan.c:4589
>        try_to_shrink_lruvec+0x612/0x9b0 mm/vmscan.c:4784
>        shrink_one+0x3e3/0x7b0 mm/vmscan.c:4822
>        shrink_many mm/vmscan.c:4885 [inline]
>        lru_gen_shrink_node mm/vmscan.c:4963 [inline]
>        shrink_node+0xbbc/0x3ed0 mm/vmscan.c:5943
>        kswapd_shrink_node mm/vmscan.c:6771 [inline]
>        balance_pgdat+0xc1f/0x18f0 mm/vmscan.c:6963
>        kswapd+0x5f8/0xc30 mm/vmscan.c:7232
>        kthread+0x2c1/0x3a0 kernel/kthread.c:389
>        ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
>        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(fs_reclaim);
>                                lock(&q->q_usage_counter(io)#68);
>                                lock(fs_reclaim);
>   rlock(&q->q_usage_counter(io)#68);
> 
>  *** DEADLOCK ***