linux-kernel - Re: [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260106113626.7319-2-ionut.nechita@windriver.com>
Date: Tue,  6 Jan 2026 13:36:27 +0200
From: djiony2011@...il.com
To: muchun.song@...ux.dev
Cc: axboe@...nel.dk,
	gregkh@...uxfoundation.org,
	ionut.nechita@...driver.com,
	linux-block@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	ming.lei@...hat.com,
	sashal@...nel.org,
	stable@...r.kernel.org
Subject: Re: [PATCH v2 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path

From: Ionut Nechita <ionut.nechita@...driver.com>

Hi Muchun,

Thank you for the detailed review. Your questions about the memory barriers are
absolutely correct and highlight fundamental issues with my approach.

> Memory barriers must be used in pairs. So how to synchronize?

> Why we need another barrier? What order does this barrier guarantee?

You're right to ask these questions. After careful consideration and discussion
with Ming Lei, I've concluded that the memory barrier approach in this patch is
flawed and insufficient.

The fundamental problem is:
1. Memory barriers need proper pairing on both read and write sides
2. The write-side barriers would need to be inserted at MULTIPLE call sites
   throughout the block layer - everywhere work is added before calling
   blk_mq_run_hw_queue()
3. This is exactly why the original commit 679b1874eba7 chose the lock-based
   approach, noting that "memory barrier is not easy to be maintained"

My patch attempted to add barriers only in blk_mq_run_hw_queue(), but didn't
address the pairing barriers needed at all the call sites that add work to
dispatch lists/sw queues. This makes the synchronization incomplete.

## New approach: dedicated raw_spinlock

I'm abandoning the memory barrier approach and preparing a new patch that uses
a dedicated raw_spinlock_t (quiesce_sync_lock) instead of the general-purpose
queue_lock.

The key differences from the current problematic code:
- Current: Uses queue_lock (spinlock_t) which becomes rt_mutex in RT kernel
- New: Uses quiesce_sync_lock (raw_spinlock_t) which stays a real spinlock

Why raw_spinlock is safe:
- Critical section is provably short (only flag and counter checks)
- No sleeping operations under the lock
- Specific to quiesce synchronization, not general queue operations

This approach:
- Maintains the correct synchronization from 679b1874eba7
- Avoids sleeping in RT kernel's IRQ thread context
- Simpler and more maintainable than memory barrier pairing across many call sites

I'll send the new patch shortly. Thank you for catching these issues before
they made it into the kernel.

Best regards,
Ionut