[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <856ed55d-b07b-499c-b340-2efa70c73f7a@gmail.com>
Date: Wed, 29 Jan 2025 18:57:00 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Max Kellermann <max.kellermann@...os.com>, axboe@...nel.dk,
io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/8] io_uring/io-wq: cache work->flags in variable
On 1/28/25 13:39, Max Kellermann wrote:
> This eliminates several redundant atomic reads and therefore reduces
> the duration the surrounding spinlocks are held.
What architecture are you running? I don't get why the reads
are expensive while it's relaxed and there shouldn't even be
any contention. It doesn't even need to be atomics, we still
should be able to convert int back to plain ints.
> In several io_uring benchmarks, this reduced the CPU time spent in
> queued_spin_lock_slowpath() considerably:
>
> io_uring benchmark with a flood of `IORING_OP_NOP` and `IOSQE_ASYNC`:
>
> 38.86% -1.49% [kernel.kallsyms] [k] queued_spin_lock_slowpath
> 6.75% +0.36% [kernel.kallsyms] [k] io_worker_handle_work
> 2.60% +0.19% [kernel.kallsyms] [k] io_nop
> 3.92% +0.18% [kernel.kallsyms] [k] io_req_task_complete
> 6.34% -0.18% [kernel.kallsyms] [k] io_wq_submit_work
>
> HTTP server, static file:
>
> 42.79% -2.77% [kernel.kallsyms] [k] queued_spin_lock_slowpath
> 2.08% +0.23% [kernel.kallsyms] [k] io_wq_submit_work
> 1.19% +0.20% [kernel.kallsyms] [k] amd_iommu_iotlb_sync_map
> 1.46% +0.15% [kernel.kallsyms] [k] ep_poll_callback
> 1.80% +0.15% [kernel.kallsyms] [k] io_worker_handle_work
>
> HTTP server, PHP:
>
> 35.03% -1.80% [kernel.kallsyms] [k] queued_spin_lock_slowpath
> 0.84% +0.21% [kernel.kallsyms] [k] amd_iommu_iotlb_sync_map
> 1.39% +0.12% [kernel.kallsyms] [k] _copy_to_iter
> 0.21% +0.10% [kernel.kallsyms] [k] update_sd_lb_stats
>
> Signed-off-by: Max Kellermann <max.kellermann@...os.com>
--
Pavel Begunkov
Powered by blists - more mailing lists