[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <178c6f1b-cfda-46cb-8c15-a25a5319f6e4@kernel.dk>
Date: Thu, 30 Jan 2025 07:54:36 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Pavel Begunkov <asml.silence@...il.com>,
Max Kellermann <max.kellermann@...os.com>
Cc: io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/8] io_uring/io-wq: cache work->flags in variable
On 1/29/25 4:41 PM, Pavel Begunkov wrote:
> On 1/29/25 19:11, Max Kellermann wrote:
>> On Wed, Jan 29, 2025 at 7:56?PM Pavel Begunkov <asml.silence@...il.com> wrote:
>>> What architecture are you running? I don't get why the reads
>>> are expensive while it's relaxed and there shouldn't even be
>>> any contention. It doesn't even need to be atomics, we still
>>> should be able to convert int back to plain ints.
>>
>> I measured on an AMD Epyc 9654P.
>> As you see in my numbers, around 40% of the CPU time was wasted on
>> spinlock contention. Dozens of io-wq threads are trampling on each
>> other's feet all the time.
>> I don't think this is about memory accesses being exceptionally
>> expensive; it's just about wringing every cycle from the code section
>> that's under the heavy-contention spinlock.
>
> Ok, then it's an architectural problem and needs more serious
> reengineering, e.g. of how work items are stored and grabbed, and it
> might even get some more use cases for io_uring. FWIW, I'm not saying
> smaller optimisations shouldn't have place especially when they're
> clean.
Totally agree - io-wq would need some improvements on the where to queue
and pull work to make it scale better, which may indeed be a good idea
to do and would open it up to more use cases that currently don't make
much sense.
That said, also agree that the minor optimizations still have a place,
it's not like they will stand in the way of general improvements as
well.
--
Jens Axboe
Powered by blists - more mailing lists