linux-kernel - Inconsistent lock state for sbq->ws[i].wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <ZmtB0USK0+zdRe0R@ideak-desk.fi.intel.com>
Date: Thu, 13 Jun 2024 22:00:33 +0300
From: Imre Deak <imre.deak@...el.com>
To: Jens Axboe <axboe@...nel.dk>,
	"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>,
	"Martin K. Petersen" <martin.petersen@...cle.com>
Cc: linux-kernel@...r.kernel.org
Subject: Inconsistent lock state for sbq->ws[i].wait

Hi,

the Intel GFX CI regularly triggers the WARN below, which seems to have
started with 6.5-rc1 (6.9.0-rc6 being the first version with still
available logs where I could check the stacktrace matching exactly) and
always happens on - multiple different - Sandybridge machines, but not
on any other old or new Intel platforms in our CI farm. I'm not sure
about the root cause, so thought to send this to blk-mq-sched.c and
scsi_lib.c maintainers for any clues. For the history and full logs
please see:

https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/8852

Thanks,
Imre

<4> [211.780637] ================================
<4> [211.780643] WARNING: inconsistent lock state
<4> [211.780648] 6.10.0-rc3-Patchwork_133624v5-ga7d697ac042a+ #1 Not tainted
<4> [211.780656] --------------------------------
<4> [211.780661] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
<4> [211.780668] kworker/6:2/110 [HC0[0]:SC0[0]:HE0:SE1] takes:
<4> [211.780676] ffff8881030b9218 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x4f7/0x740
<4> [211.780695] {IN-SOFTIRQ-W} state was registered at:
<4> [211.780701]   lock_acquire+0xd1/0x2f0
<4> [211.780710]   _raw_spin_lock_irqsave+0x3d/0x60
<4> [211.780719]   __wake_up_common_lock+0x1d/0x60
<4> [211.780726]   sbitmap_queue_wake_up+0x74/0xb0
<4> [211.780735]   sbitmap_queue_clear+0x3b/0x60
<4> [211.780742]   __blk_mq_free_request+0x71/0xd0
<4> [211.780750]   scsi_end_request+0xf4/0x300
<4> [211.780758]   scsi_io_completion+0x40/0x590
<4> [211.780764]   blk_complete_reqs+0x3b/0x50
<4> [211.780771]   handle_softirqs+0xbf/0x3f0
<4> [211.780778]   irq_exit_rcu+0x87/0xc0
<4> [211.780784]   common_interrupt+0xbb/0xe0
<4> [211.780793]   asm_common_interrupt+0x26/0x40
<4> [211.780801]   lock_is_held_type+0x54/0x140
<4> [211.780808]   __might_resched+0x23a/0x2b0
<4> [211.780816]   dput.part.0+0x18/0x400
<4> [211.780824]   step_into+0x3b2/0x720
<4> [211.780832]   link_path_walk.part.0+0x279/0x390
<4> [211.780840]   path_lookupat.isra.0+0x39/0x150
<4> [211.780848]   filename_lookup+0xbb/0x140
<4> [211.780856]   vfs_statx+0x84/0x190
<4> [211.780863]   vfs_fstatat+0x55/0xc0
<4> [211.780869]   __do_sys_newlstat+0x33/0x70
<4> [211.780876]   do_syscall_64+0x66/0x140
<4> [211.780883]   entry_SYSCALL_64_after_hwframe+0x76/0x7e
<4> [211.780891] irq event stamp: 88280
<4> [211.780897] hardirqs last  enabled at (88279): [<ffffffff81d78f28>] _raw_spin_unlock_irqrestore+0x58/0x70
<4> [211.780910] hardirqs last disabled at (88280): [<ffffffff81d78bb5>] _raw_spin_lock_irq+0x45/0x50
<4> [211.780922] softirqs last  enabled at (85746): [<ffffffffa013c94a>] tg3_mdio_read+0x5a/0x90 [tg3]
<4> [211.780940] softirqs last disabled at (85744): [<ffffffffa013c924>] tg3_mdio_read+0x34/0x90 [tg3]
<4> [211.780956] 
other info that might help us debug this:
<4> [211.780963]  Possible unsafe locking scenario:
<4> [211.780971]        CPU0
<4> [211.780975]        ----
<4> [211.780979]   lock(&sbq->ws[i].wait);
<4> [211.780986]   <Interrupt>
<4> [211.780990]     lock(&sbq->ws[i].wait);
<4> [211.780996] 
 *** DEADLOCK ***
<4> [211.781004] 5 locks held by kworker/6:2/110:
<4> [211.781010]  #0: ffff888100092d48 ((wq_completion)events_freezable_pwr_efficient){+.+.}-{0:0}, at: process_scheduled_works+0x66/0x690
<4> [211.781031]  #1: ffffc900003ebe60 ((work_completion)(&(&ev->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x66/0x690
<4> [211.781049]  #2: ffffffff8273b8c0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x137/0x430
<4> [211.781066]  #3: ffff8881030b9218 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x4f7/0x740
<4> [211.781082]  #4: ffff888106089db0 (&hctx->dispatch_wait_lock){+...}-{2:2}, at: blk_mq_dispatch_rq_list+0x515/0x740
<4> [211.781099] 
stack backtrace:
<4> [211.781105] CPU: 6 PID: 110 Comm: kworker/6:2 Not tainted 6.10.0-rc3-Patchwork_133624v5-ga7d697ac042a+ #1
<4> [211.781117] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
<4> [211.781126] Workqueue: events_freezable_pwr_efficient disk_events_workfn (events_freezable_pwr_ef)
<4> [211.781141] Call Trace:
<4> [211.781147]  <TASK>
<4> [211.781151]  dump_stack_lvl+0x82/0xd0
<4> [211.781160]  mark_lock.part.0+0x87b/0xcd0
<4> [211.781169]  ? lock_acquire+0xd1/0x2f0
<4> [211.781178]  mark_held_locks+0x49/0x80
<4> [211.781187]  lockdep_hardirqs_on_prepare+0xf6/0x1b0
<4> [211.781195]  trace_hardirqs_on+0x4b/0xd0
<4> [211.781204]  _raw_spin_unlock_irq+0x23/0x50
<4> [211.781212]  __blk_mq_alloc_driver_tag+0x109/0x180
<4> [211.781221]  blk_mq_dispatch_rq_list+0x70b/0x740
<4> [211.781231]  __blk_mq_sched_dispatch_requests+0xa7/0x620
<4> [211.781243]  blk_mq_sched_dispatch_requests+0x27/0x60
<4> [211.781252]  blk_mq_run_hw_queue+0x161/0x430
<4> [211.781261]  blk_execute_rq+0x134/0x2b0
<4> [211.781271]  scsi_execute_cmd+0x174/0x420
<4> [211.781280]  scsi_test_unit_ready+0x6b/0x100
<4> [211.781291]  sd_check_events+0xf1/0x140
<4> [211.781300]  disk_check_events+0x38/0x100
<4> [211.781309]  process_scheduled_works+0x363/0x690
<4> [211.781318]  ? process_scheduled_works+0x66/0x690
<4> [211.781327]  ? process_scheduled_works+0x66/0x690
<4> [211.781337]  worker_thread+0x13d/0x2d0
<4> [211.781345]  ? __pfx_worker_thread+0x10/0x10
<4> [211.781354]  kthread+0xed/0x120
<4> [211.781361]  ? __pfx_kthread+0x10/0x10
<4> [211.781369]  ret_from_fork+0x2f/0x50
<4> [211.781377]  ? __pfx_kthread+0x10/0x10
<4> [211.781384]  ret_from_fork_asm+0x1a/0x30
<4> [211.781396]  </TASK>