linux-kernel - Re: [PATCH v8] io_uring: releasing CPU resources when polling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e0f74c1-aa11-4036-ba20-6f4dc0c40333@gmail.com>
Date: Thu, 24 Oct 2024 15:49:01 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Jens Axboe <axboe@...nel.dk>, hexue <xue01.he@...sung.com>
Cc: io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8] io_uring: releasing CPU resources when polling

On 10/24/24 15:40, Jens Axboe wrote:
> On 10/24/24 8:26 AM, Pavel Begunkov wrote:
>> On 10/24/24 15:18, Jens Axboe wrote:
>>> On 10/23/24 8:38 PM, hexue wrote:
>>>> On 9/25/2024 12:12, Pavel Begunkov wrote:
>> ...
>>>> When the number of threads exceeds the number of CPU cores,the
>>>> database throughput does not increase significantly. However,
>>>> hybrid polling can releasing some CPU resources during the polling
>>>> process, so that part of the CPU time can be used for frequent
>>>> data processing and other operations, which speeds up the reading
>>>> process, thereby improving throughput and optimizaing database
>>>> performance.I tried different compression strategies and got
>>>> results similar to the above table.(~30% throughput improvement)
>>>>
>>>> As more database applications adapt to the io_uring engine, I think
>>>> the application of hybrid poll may have potential in some scenarios.
>>>
>>> Thanks for posting some numbers on that part, that's useful. I do
>>> think the feature is useful as well, but I still have some issues
>>> with the implementation. Below is an incremental patch on top of
>>> yours to resolve some of those, potentially. Issues:
>>>
>>> 1) The patch still reads a bit like a hack, in that it doesn't seem to
>>>      care about following the current style. This reads a bit lazy/sloppy
>>>      or unfinished. I've fixed that up.
>>>
>>> 2) Appropriate member and function naming.
>>>
>>> 3) Same as above, it doesn't care about proper placement of additions to
>>>      structs. Again this is a bit lazy and wasteful, attention should be
>>>      paid to where additions are placed to not needlessly bloat
>>>      structures, or place members in cache unfortunate locations. For
>>>      example, available_time is just placed at the end of io_ring_ctx,
>>>      why? It's a submission side member, and there's room with other
>>>      related members. Not only is the placement now where you'd want it to
>>>      be, memory wise, it also doesn't add 8 bytes to io_uring_ctx.
>>>
>>> 4) Like the above, the io_kiocb bloat is, by far, the worst. Seems to me
>>>      that we can share space with the polling hash_node. This obviously
>>>      needs careful vetting, haven't done that yet. IOPOLL setups should
>>>      not be using poll at all. This needs extra checking. The poll_state
>>>      can go with cancel_seq_set, as there's a hole there any. And just
>>>      like that, rather than add 24b to io_kiocb, it doesn't take any extra
>>>      space at all.
>>>
>>> 5) HY_POLL is a terrible name. It's associated with IOPOLL, and so let's
>>>      please use a name related to that. And require IOPOLL being set with
>>>      HYBRID_IOPOLL, as it's really a variant of that. Makes it clear that
>>>      HYBRID_IOPOLL is really just a mode of operation for IOPOLL, and it
>>>      can't exist without that.
>>>
>>> Please take a look at this incremental and test it, and then post a v9
>>> that looks a lot more finished. Caveat - I haven't tested this one at
>>> all. Thanks!
>>>
>>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>>> index c79ee9fe86d4..6cf6a45835e5 100644
>>> --- a/include/linux/io_uring_types.h
>>> +++ b/include/linux/io_uring_types.h
>>> @@ -238,6 +238,8 @@ struct io_ring_ctx {
>>>            struct io_rings        *rings;
>>>            struct percpu_ref    refs;
>>>    +        u64            poll_available_time;
>>> +
>>>            clockid_t        clockid;
>>>            enum tk_offsets        clock_offset;
>>>    @@ -433,9 +435,6 @@ struct io_ring_ctx {
>>>        struct page            **sqe_pages;
>>>          struct page            **cq_wait_page;
>>> -
>>> -    /* for io_uring hybrid poll*/
>>> -    u64            available_time;
>>>    };
>>>      struct io_tw_state {
>>> @@ -647,9 +646,22 @@ struct io_kiocb {
>>>          atomic_t            refs;
>>>        bool                cancel_seq_set;
>>> +    bool                poll_state;
>>
>> As mentioned briefly before, that can be just a req->flags flag
> 
> That'd be even better, I generally despise random bool addition.
> 
>>>        struct io_task_work        io_task_work;
>>> -    /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
>>> -    struct hlist_node        hash_node;
>>> +    union {
>>> +        /*
>>> +         * for polled requests, i.e. IORING_OP_POLL_ADD and async armed
>>> +         * poll
>>> +         */
>>> +        struct hlist_node    hash_node;
>>> +        /*
>>> +         * For IOPOLL setup queues, with hybrid polling
>>> +         */
>>> +        struct {
>>> +            u64        iopoll_start;
>>> +            u64        iopoll_end;
>>
>> And IIRC it doesn't need to store the end as it's used immediately
>> after it's set in the same function.
> 
> Nice, that opens up the door for less esoteric sharing as well. And
> yeah, I'd just use:
> 
> runtime = ktime_get_ns() - req->iopoll_start - sleep_time;
> 
> in io_uring_hybrid_poll() and kill it entirely, doesn't even need a
> local variable there. And then shove iopoll_start into the union with
> comp_list/apoll_events.

That's with what the current request is hooked into the list,
IOW such aliasing will corrupt the request

-- 
Pavel Begunkov