[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260106144023.381884-2-ionut.nechita@windriver.com>
Date: Tue, 6 Jan 2026 16:40:21 +0200
From: Ionut Nechita <djiony2011@...il.com>
To: bvanassche@....org
Cc: axboe@...nel.dk,
gregkh@...uxfoundation.org,
ionut.nechita@...driver.com,
linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org,
ming.lei@...hat.com,
muchun.song@...ux.dev,
sashal@...nel.org,
stable@...r.kernel.org
Subject: Re: [PATCH v2 2/2] block: Fix WARN_ON in blk_mq_run_hw_queue when called from interrupt context
Hi Bart,
Thank you for the thorough and insightful review. You've identified several critical issues with my submission that I need to address.
> 6.6.71 is pretty far away from Jens' for-next branch. Please use Jens'
> for-next branch for testing kernel patches intended for the upstream kernel.
You're absolutely right. I was testing on the stable Debian kernel (6.6.71-rt) which was where the issue was originally reported. I will now fetch and test on Jens' for-next branch and ensure the issue reproduces there before resubmitting.
> Where in the above call stack is the code that disables interrupts?
This was poorly worded on my part, and I apologize for the confusion. The issue is NOT "interrupt context" in the hardirq sense.
What's actually happening:
- **Context:** kworker thread (async SCSI device scan)
- **State:** Running with preemption disabled (atomic context, not hardirq)
- **Path:** Queue destruction during device probe error cleanup
- **Trigger:** On PREEMPT_RT, in_interrupt() returns true when preemption is disabled, even in process context
The WARN_ON in blk_mq_run_hw_queue() at line 2291 is:
WARN_ON_ONCE(!async && in_interrupt());
On PREEMPT_RT, this check fires because:
1. blk_freeze_queue_start() calls blk_mq_run_hw_queues(q, false) ← async=false
2. This eventually calls blk_mq_run_hw_queue() with async=false
3. in_interrupt() returns true (because preempt_count indicates atomic state)
4. WARN_ON triggers
So it's not "interrupt context" - it's atomic context (preemption disabled) being detected by in_interrupt() on RT kernel.
> How is the above call stack related to the reported problem? The above
> call stack is about request queue allocation while the reported problem
> happens during request queue destruction.
You're absolutely correct, and I apologize for the confusion. I mistakenly included two different call stacks in my commit message:
1. **"scheduling while atomic" during blk_mq_realloc_hw_ctxs** - This was from queue allocation and is a DIFFERENT issue. It should NOT have been included.
2. **WARN_ON during blk_queue_start_drain** - This is the ACTUAL issue that my patch addresses (queue destruction path).
I will revise the commit message to remove the unrelated allocation stack trace and focus solely on the queue destruction path.
> I apologize for the confusion in my commit message. Should I:
> 1. Revise the commit message to accurately describe the blk_queue_start_drain() path?
> 2. Add details about the PREEMPT_RT context causing the atomic state?
>
> The answer to both questions is yes.
Understood. I will prepare v3->v5 with the following corrections:
1. **Test on Jens' for-next branch** - Fetch, reproduce, and validate the fix on the upstream development tree
2. **Accurate context description** - Replace "IRQ thread context" with "kworker context with preemption disabled (atomic context on RT)"
3. **Single, clear call stack** - Remove the confusing allocation stack trace, focus only on the destruction path:
```
scsi_alloc_sdev (error path)
→ __scsi_remove_device
→ blk_mq_destroy_queue
→ blk_queue_start_drain
→ blk_freeze_queue_start
→ blk_mq_run_hw_queues(q, false) ← Problem: async=false
```
4. **Explain PREEMPT_RT specifics** - Clearly describe why in_interrupt() returns true in atomic context on RT kernel, and how changing to async=true avoids the problem
5. **Accurate problem statement** - This is about avoiding synchronous queue runs in atomic context on RT, not about MSI-X IRQ thread contention (that was a misunderstanding on my part)
I'll respond again once I've validated on for-next and have a corrected v3->v5 ready.
Thank you again for the detailed feedback.
Best regards,
Ionut
--
2.52.0
Powered by blists - more mailing lists