[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b4e6ef48-6b3e-4b62-978c-6e80d3e9218e@gmail.com>
Date: Fri, 21 Feb 2025 15:28:28 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Bui Quang Minh <minhquangbui99@...il.com>, io-uring@...r.kernel.org
Cc: Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] io_uring/io-wq: try to batch multiple free work
On 2/21/25 14:52, Bui Quang Minh wrote:
> On 2/21/25 19:44, Pavel Begunkov wrote:
>> On 2/21/25 04:19, Bui Quang Minh wrote:
>>> Currently, in case we don't use IORING_SETUP_DEFER_TASKRUN, when io
>>> worker frees work, it needs to add a task work. This creates contention
>>> on tctx->task_list. With this commit, io work queues free work on a
>>> local list and batch multiple free work in one call when the number of
>>> free work in local list exceeds IO_REQ_ALLOC_BATCH.
>>
>> I see no relation to IO_REQ_ALLOC_BATCH, that should be
>> a separate macro.
>>
>>> Signed-off-by: Bui Quang Minh <minhquangbui99@...il.com>
>>> ---
>>> io_uring/io-wq.c | 62 +++++++++++++++++++++++++++++++++++++++++++--
>>> io_uring/io-wq.h | 4 ++-
>>> io_uring/io_uring.c | 23 ++++++++++++++---
>>> io_uring/io_uring.h | 6 ++++-
>>> 4 files changed, 87 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
>>> index 5d0928f37471..096711707db9 100644
>>> --- a/io_uring/io-wq.c
>>> +++ b/io_uring/io-wq.c
>> ...
>>> @@ -601,7 +622,41 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
>>> wq->do_work(work);
>>> io_assign_current_work(worker, NULL);
>>> - linked = wq->free_work(work);
>>> + /*
>>> + * All requests in free list must have the same
>>> + * io_ring_ctx.
>>> + */
>>> + if (last_added_ctx && last_added_ctx != req->ctx) {
>>> + flush_req_free_list(&free_list, tail);
>>> + tail = NULL;
>>> + last_added_ctx = NULL;
>>> + free_req = 0;
>>> + }
>>> +
>>> + /*
>>> + * Try to batch free work when
>>> + * !IORING_SETUP_DEFER_TASKRUN to reduce contention
>>> + * on tctx->task_list.
>>> + */
>>> + if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN)
>>> + linked = wq->free_work(work, NULL, NULL);
>>> + else
>>> + linked = wq->free_work(work, &free_list, &did_free);
>>
>> The problem here is that iowq is blocking and hence you lock up resources
>> of already completed request for who knows how long. In case of unbound
>> requests (see IO_WQ_ACCT_UNBOUND) it's indefinite, and it's absolutely
>> cannot be used without some kind of a timer. But even in case of bound
>> work, it can be pretty long.
> That's a good point, I've overlooked the fact that work handler might block indefinitely.
>> Maybe, for bound requests it can target N like here, but read jiffies
>> in between each request and flush if it has been too long. So in worst
>> case the total delay is the last req execution time + DT. But even then
>> it feels wrong, especially with filesystems sometimes not even
>> honouring NOWAIT.
>>
>> The question is, why do you force it into the worker pool with the
>> IOSQE_ASYNC flag? It's generally not recommended, and the name of the
>> flag is confusing as it should've been more like "WORKER_OFFLOAD".
>
>
> I launched more workers to parallel the work handler, but as you said, it seems like an incorrect use case.
Not as much incorrect as generally inefficient and not recommended,
especially not recommended before trying it without the flag. People
often fall into the trap of "Oh, it's _async_, surely I have to set it",
really unfortunate naming.
> However, I think the request free seems heavy, we need to create a task work so that we can hold the uring_lock to queue the request to ctx->submit_state->compl_reqs. Let me play around more to see if I can find an optimization for this.
That's because it's a slow fallback path for cases that can't do
async for one reason or another, and ideally we wouldn't have it
at all. In reality it's used more than I'd wish for, but it's
still a path we don't heavily optimise.
Btw, if you're really spamming iowq threads, I'm surprised that's
the only hotspot you have. There should be some contention for
CQE posting (->completion_lock), and probably in the iowq queue
locking, and so on.
--
Pavel Begunkov
Powered by blists - more mailing lists