linux-kernel - Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6eb6abcf-26aa-473d-843e-428ae0f38203@acm.org>
Date: Tue, 6 Jan 2026 04:29:36 -0800
From: Bart Van Assche <bvanassche@....org>
To: djiony2011@...il.com, ming.lei@...hat.com
Cc: axboe@...nel.dk, gregkh@...uxfoundation.org, ionut.nechita@...driver.com,
 linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 muchun.song@...ux.dev, sashal@...nel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when
 called from interrupt context

On 1/6/26 3:14 AM, djiony2011@...il.com wrote:
> [Mon Dec 22 10:18:18 2025] WARNING: CPU: 190 PID: 2041 at block/blk-mq.c:2291 blk_mq_run_hw_queue+0x1fa/0x260
> [Mon Dec 22 10:18:18 2025] Modules linked in:
> [Mon Dec 22 10:18:18 2025] CPU: 190 PID: 2041 Comm: kworker/u385:1 Tainted: G        W          6.6.0-1-rt-amd64 #1  Debian 6.6.71-1

6.6.71 is pretty far away from Jens' for-next branch. Please use Jens'
for-next branch for testing kernel patches intended for the upstream
kernel.

> [Mon Dec 22 10:18:18 2025] Call Trace:
> [Mon Dec 22 10:18:18 2025]  <TASK>
> [Mon Dec 22 10:18:18 2025]  blk_mq_run_hw_queues+0x6c/0x130
> [Mon Dec 22 10:18:18 2025]  blk_queue_start_drain+0x12/0x40
> [Mon Dec 22 10:18:18 2025]  blk_mq_destroy_queue+0x37/0x70
> [Mon Dec 22 10:18:18 2025]  __scsi_remove_device+0x6a/0x180
> [Mon Dec 22 10:18:18 2025]  scsi_alloc_sdev+0x357/0x360
> [Mon Dec 22 10:18:18 2025]  scsi_probe_and_add_lun+0x8ac/0xc00
> [Mon Dec 22 10:18:18 2025]  __scsi_scan_target+0xf0/0x520
> [Mon Dec 22 10:18:18 2025]  scsi_scan_channel+0x57/0x90
> [Mon Dec 22 10:18:18 2025]  scsi_scan_host_selected+0xd4/0x110
> [Mon Dec 22 10:18:18 2025]  do_scan_async+0x1c/0x190
> [Mon Dec 22 10:18:18 2025]  async_run_entry_fn+0x2f/0x130
> [Mon Dec 22 10:18:18 2025]  process_one_work+0x175/0x370
> [Mon Dec 22 10:18:18 2025]  worker_thread+0x280/0x390
> [Mon Dec 22 10:18:18 2025]  kthread+0xdd/0x110
> [Mon Dec 22 10:18:18 2025]  ret_from_fork+0x31/0x50
> [Mon Dec 22 10:18:18 2025]  ret_from_fork_asm+0x1b/0x30

Where in the above call stack is the code that disables interrupts?

> 3. **The actual problem on PREEMPT_RT**: There's a preceding "scheduling while atomic"
>     error that provides the real context:
> 
> [Mon Dec 22 10:18:18 2025] BUG: scheduling while atomic: kworker/u385:1/2041/0x00000002
> [Mon Dec 22 10:18:18 2025] Call Trace:
> [Mon Dec 22 10:18:18 2025]  dump_stack_lvl+0x37/0x50
> [Mon Dec 22 10:18:18 2025]  __schedule_bug+0x52/0x60
> [Mon Dec 22 10:18:18 2025]  __schedule+0x87d/0xb10
> [Mon Dec 22 10:18:18 2025]  rt_mutex_schedule+0x21/0x40
> [Mon Dec 22 10:18:18 2025]  rt_mutex_slowlock_block.constprop.0+0x33/0x170
> [Mon Dec 22 10:18:18 2025]  __rt_mutex_slowlock_locked.constprop.0+0xc4/0x1e0
> [Mon Dec 22 10:18:18 2025]  mutex_lock+0x44/0x60
> [Mon Dec 22 10:18:18 2025]  __cpuhp_state_add_instance_cpuslocked+0x41/0x110
> [Mon Dec 22 10:18:18 2025]  __cpuhp_state_add_instance+0x48/0xd0
> [Mon Dec 22 10:18:18 2025]  blk_mq_realloc_hw_ctxs+0x405/0x420
> [Mon Dec 22 10:18:18 2025]  blk_mq_init_allocated_queue+0x10a/0x480

How is the above call stack related to the reported problem? The above
call stack is about request queue allocation while the reported problem
happens during request queue destruction.

> I apologize for the confusion in my commit message. Should I:
> 1. Revise the commit message to accurately describe the blk_queue_start_drain() path?
> 2. Add details about the PREEMPT_RT context causing the atomic state?

The answer to both questions is yes.

Thanks,

Bart.