[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e0f74c1-aa11-4036-ba20-6f4dc0c40333@gmail.com>
Date: Thu, 24 Oct 2024 15:49:01 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Jens Axboe <axboe@...nel.dk>, hexue <xue01.he@...sung.com>
Cc: io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8] io_uring: releasing CPU resources when polling
On 10/24/24 15:40, Jens Axboe wrote:
> On 10/24/24 8:26 AM, Pavel Begunkov wrote:
>> On 10/24/24 15:18, Jens Axboe wrote:
>>> On 10/23/24 8:38 PM, hexue wrote:
>>>> On 9/25/2024 12:12, Pavel Begunkov wrote:
>> ...
>>>> When the number of threads exceeds the number of CPU cores,the
>>>> database throughput does not increase significantly. However,
>>>> hybrid polling can releasing some CPU resources during the polling
>>>> process, so that part of the CPU time can be used for frequent
>>>> data processing and other operations, which speeds up the reading
>>>> process, thereby improving throughput and optimizaing database
>>>> performance.I tried different compression strategies and got
>>>> results similar to the above table.(~30% throughput improvement)
>>>>
>>>> As more database applications adapt to the io_uring engine, I think
>>>> the application of hybrid poll may have potential in some scenarios.
>>>
>>> Thanks for posting some numbers on that part, that's useful. I do
>>> think the feature is useful as well, but I still have some issues
>>> with the implementation. Below is an incremental patch on top of
>>> yours to resolve some of those, potentially. Issues:
>>>
>>> 1) The patch still reads a bit like a hack, in that it doesn't seem to
>>> care about following the current style. This reads a bit lazy/sloppy
>>> or unfinished. I've fixed that up.
>>>
>>> 2) Appropriate member and function naming.
>>>
>>> 3) Same as above, it doesn't care about proper placement of additions to
>>> structs. Again this is a bit lazy and wasteful, attention should be
>>> paid to where additions are placed to not needlessly bloat
>>> structures, or place members in cache unfortunate locations. For
>>> example, available_time is just placed at the end of io_ring_ctx,
>>> why? It's a submission side member, and there's room with other
>>> related members. Not only is the placement now where you'd want it to
>>> be, memory wise, it also doesn't add 8 bytes to io_uring_ctx.
>>>
>>> 4) Like the above, the io_kiocb bloat is, by far, the worst. Seems to me
>>> that we can share space with the polling hash_node. This obviously
>>> needs careful vetting, haven't done that yet. IOPOLL setups should
>>> not be using poll at all. This needs extra checking. The poll_state
>>> can go with cancel_seq_set, as there's a hole there any. And just
>>> like that, rather than add 24b to io_kiocb, it doesn't take any extra
>>> space at all.
>>>
>>> 5) HY_POLL is a terrible name. It's associated with IOPOLL, and so let's
>>> please use a name related to that. And require IOPOLL being set with
>>> HYBRID_IOPOLL, as it's really a variant of that. Makes it clear that
>>> HYBRID_IOPOLL is really just a mode of operation for IOPOLL, and it
>>> can't exist without that.
>>>
>>> Please take a look at this incremental and test it, and then post a v9
>>> that looks a lot more finished. Caveat - I haven't tested this one at
>>> all. Thanks!
>>>
>>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>>> index c79ee9fe86d4..6cf6a45835e5 100644
>>> --- a/include/linux/io_uring_types.h
>>> +++ b/include/linux/io_uring_types.h
>>> @@ -238,6 +238,8 @@ struct io_ring_ctx {
>>> struct io_rings *rings;
>>> struct percpu_ref refs;
>>> + u64 poll_available_time;
>>> +
>>> clockid_t clockid;
>>> enum tk_offsets clock_offset;
>>> @@ -433,9 +435,6 @@ struct io_ring_ctx {
>>> struct page **sqe_pages;
>>> struct page **cq_wait_page;
>>> -
>>> - /* for io_uring hybrid poll*/
>>> - u64 available_time;
>>> };
>>> struct io_tw_state {
>>> @@ -647,9 +646,22 @@ struct io_kiocb {
>>> atomic_t refs;
>>> bool cancel_seq_set;
>>> + bool poll_state;
>>
>> As mentioned briefly before, that can be just a req->flags flag
>
> That'd be even better, I generally despise random bool addition.
>
>>> struct io_task_work io_task_work;
>>> - /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
>>> - struct hlist_node hash_node;
>>> + union {
>>> + /*
>>> + * for polled requests, i.e. IORING_OP_POLL_ADD and async armed
>>> + * poll
>>> + */
>>> + struct hlist_node hash_node;
>>> + /*
>>> + * For IOPOLL setup queues, with hybrid polling
>>> + */
>>> + struct {
>>> + u64 iopoll_start;
>>> + u64 iopoll_end;
>>
>> And IIRC it doesn't need to store the end as it's used immediately
>> after it's set in the same function.
>
> Nice, that opens up the door for less esoteric sharing as well. And
> yeah, I'd just use:
>
> runtime = ktime_get_ns() - req->iopoll_start - sleep_time;
>
> in io_uring_hybrid_poll() and kill it entirely, doesn't even need a
> local variable there. And then shove iopoll_start into the union with
> comp_list/apoll_events.
That's with what the current request is hooked into the list,
IOW such aliasing will corrupt the request
--
Pavel Begunkov
Powered by blists - more mailing lists