lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADUfDZq54SYfc6XNa6b3i7oktLfL+T-C-DSfka5wyh1WafbowA@mail.gmail.com>
Date: Fri, 2 May 2025 07:44:04 -0700
From: Caleb Sander Mateos <csander@...estorage.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: Christoph Hellwig <hch@...radead.org>, linux-block@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/3] block: avoid hctx spinlock for plug with multiple queues

Hi Jens,
Christoph has reviewed this series. Would you mind queueing it up for 6.16?

Thanks,
Caleb

On Fri, Apr 25, 2025 at 6:17 PM Caleb Sander Mateos
<csander@...estorage.com> wrote:
>
> blk_mq_flush_plug_list() has a fast path if all requests in the plug
> are destined for the same request_queue. It calls ->queue_rqs() with the
> whole batch of requests, falling back on ->queue_rq() for any requests
> not handled by ->queue_rqs(). However, if the requests are destined for
> multiple queues, blk_mq_flush_plug_list() has a slow path that calls
> blk_mq_dispatch_list() repeatedly to filter the requests by ctx/hctx.
> Each queue's requests are inserted into the hctx's dispatch list under a
> spinlock, then __blk_mq_sched_dispatch_requests() takes them out of the
> dispatch list (taking the spinlock again), and finally
> blk_mq_dispatch_rq_list() calls ->queue_rq() on each request.
>
> Acquiring the hctx spinlock twice and calling ->queue_rq() instead of
> ->queue_rqs() makes the slow path significantly more expensive. Thus,
> batching more requests into a single plug (e.g. io_uring_enter syscall)
> can counterintuitively hurt performance by causing the plug to span
> multiple queues. We have observed 2-3% of CPU time spent acquiring the
> hctx spinlock alone on workloads issuing requests to multiple NVMe
> devices in the same io_uring SQE batches.
>
> Add a medium path in blk_mq_flush_plug_list() for plugs that don't have
> elevators or come from a schedule, but do span multiple queues. Filter
> the requests by queue and call ->queue_rqs()/->queue_rq() on the list of
> requests destined to each request_queue.
>
> With this change, we no longer see any CPU time spent in _raw_spin_lock
> from blk_mq_flush_plug_list and throughput increases accordingly.
>
> Caleb Sander Mateos (3):
>   block: take rq_list instead of plug in dispatch functions
>   block: factor out blk_mq_dispatch_queue_requests() helper
>   block: avoid hctx spinlock for plug with multiple queues
>
>  block/blk-mq.c      | 110 +++++++++++++++++++++++++++++++-------------
>  block/mq-deadline.c |   2 +-
>  2 files changed, 79 insertions(+), 33 deletions(-)
>
> v2:
> - Leave unmatched requests in plug list instead of building a new list
> - Add Reviewed-by tags
>
> --
> 2.45.2
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ