linux-kernel - Re: [PATCH v2 0/3] block: avoid hctx spinlock for plug with multiple queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADUfDZq54SYfc6XNa6b3i7oktLfL+T-C-DSfka5wyh1WafbowA@mail.gmail.com>
Date: Fri, 2 May 2025 07:44:04 -0700
From: Caleb Sander Mateos <csander@...estorage.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: Christoph Hellwig <hch@...radead.org>, linux-block@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/3] block: avoid hctx spinlock for plug with multiple queues

Hi Jens,
Christoph has reviewed this series. Would you mind queueing it up for 6.16?

Thanks,
Caleb

On Fri, Apr 25, 2025 at 6:17 PM Caleb Sander Mateos
<csander@...estorage.com> wrote:
>
> blk_mq_flush_plug_list() has a fast path if all requests in the plug
> are destined for the same request_queue. It calls ->queue_rqs() with the
> whole batch of requests, falling back on ->queue_rq() for any requests
> not handled by ->queue_rqs(). However, if the requests are destined for
> multiple queues, blk_mq_flush_plug_list() has a slow path that calls
> blk_mq_dispatch_list() repeatedly to filter the requests by ctx/hctx.
> Each queue's requests are inserted into the hctx's dispatch list under a
> spinlock, then __blk_mq_sched_dispatch_requests() takes them out of the
> dispatch list (taking the spinlock again), and finally
> blk_mq_dispatch_rq_list() calls ->queue_rq() on each request.
>
> Acquiring the hctx spinlock twice and calling ->queue_rq() instead of
> ->queue_rqs() makes the slow path significantly more expensive. Thus,
> batching more requests into a single plug (e.g. io_uring_enter syscall)
> can counterintuitively hurt performance by causing the plug to span
> multiple queues. We have observed 2-3% of CPU time spent acquiring the
> hctx spinlock alone on workloads issuing requests to multiple NVMe
> devices in the same io_uring SQE batches.
>
> Add a medium path in blk_mq_flush_plug_list() for plugs that don't have
> elevators or come from a schedule, but do span multiple queues. Filter
> the requests by queue and call ->queue_rqs()/->queue_rq() on the list of
> requests destined to each request_queue.
>
> With this change, we no longer see any CPU time spent in _raw_spin_lock
> from blk_mq_flush_plug_list and throughput increases accordingly.
>
> Caleb Sander Mateos (3):
>   block: take rq_list instead of plug in dispatch functions
>   block: factor out blk_mq_dispatch_queue_requests() helper
>   block: avoid hctx spinlock for plug with multiple queues
>
>  block/blk-mq.c      | 110 +++++++++++++++++++++++++++++++-------------
>  block/mq-deadline.c |   2 +-
>  2 files changed, 79 insertions(+), 33 deletions(-)
>
> v2:
> - Leave unmatched requests in plug list instead of building a new list
> - Add Reviewed-by tags
>
> --
> 2.45.2
>