lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62e57b0e-b646-4f96-bb83-5a0ecb4050da@kernel.dk>
Date: Thu, 24 Oct 2024 08:49:30 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Pavel Begunkov <asml.silence@...il.com>, hexue <xue01.he@...sung.com>
Cc: io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8] io_uring: releasing CPU resources when polling

On 10/24/24 8:49 AM, Pavel Begunkov wrote:
> On 10/24/24 15:40, Jens Axboe wrote:
>> On 10/24/24 8:26 AM, Pavel Begunkov wrote:
>>> On 10/24/24 15:18, Jens Axboe wrote:
>>>> On 10/23/24 8:38 PM, hexue wrote:
>>>>> On 9/25/2024 12:12, Pavel Begunkov wrote:
>>> ...
>>>>> When the number of threads exceeds the number of CPU cores,the
>>>>> database throughput does not increase significantly. However,
>>>>> hybrid polling can releasing some CPU resources during the polling
>>>>> process, so that part of the CPU time can be used for frequent
>>>>> data processing and other operations, which speeds up the reading
>>>>> process, thereby improving throughput and optimizaing database
>>>>> performance.I tried different compression strategies and got
>>>>> results similar to the above table.(~30% throughput improvement)
>>>>>
>>>>> As more database applications adapt to the io_uring engine, I think
>>>>> the application of hybrid poll may have potential in some scenarios.
>>>>
>>>> Thanks for posting some numbers on that part, that's useful. I do
>>>> think the feature is useful as well, but I still have some issues
>>>> with the implementation. Below is an incremental patch on top of
>>>> yours to resolve some of those, potentially. Issues:
>>>>
>>>> 1) The patch still reads a bit like a hack, in that it doesn't seem to
>>>>      care about following the current style. This reads a bit lazy/sloppy
>>>>      or unfinished. I've fixed that up.
>>>>
>>>> 2) Appropriate member and function naming.
>>>>
>>>> 3) Same as above, it doesn't care about proper placement of additions to
>>>>      structs. Again this is a bit lazy and wasteful, attention should be
>>>>      paid to where additions are placed to not needlessly bloat
>>>>      structures, or place members in cache unfortunate locations. For
>>>>      example, available_time is just placed at the end of io_ring_ctx,
>>>>      why? It's a submission side member, and there's room with other
>>>>      related members. Not only is the placement now where you'd want it to
>>>>      be, memory wise, it also doesn't add 8 bytes to io_uring_ctx.
>>>>
>>>> 4) Like the above, the io_kiocb bloat is, by far, the worst. Seems to me
>>>>      that we can share space with the polling hash_node. This obviously
>>>>      needs careful vetting, haven't done that yet. IOPOLL setups should
>>>>      not be using poll at all. This needs extra checking. The poll_state
>>>>      can go with cancel_seq_set, as there's a hole there any. And just
>>>>      like that, rather than add 24b to io_kiocb, it doesn't take any extra
>>>>      space at all.
>>>>
>>>> 5) HY_POLL is a terrible name. It's associated with IOPOLL, and so let's
>>>>      please use a name related to that. And require IOPOLL being set with
>>>>      HYBRID_IOPOLL, as it's really a variant of that. Makes it clear that
>>>>      HYBRID_IOPOLL is really just a mode of operation for IOPOLL, and it
>>>>      can't exist without that.
>>>>
>>>> Please take a look at this incremental and test it, and then post a v9
>>>> that looks a lot more finished. Caveat - I haven't tested this one at
>>>> all. Thanks!
>>>>
>>>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>>>> index c79ee9fe86d4..6cf6a45835e5 100644
>>>> --- a/include/linux/io_uring_types.h
>>>> +++ b/include/linux/io_uring_types.h
>>>> @@ -238,6 +238,8 @@ struct io_ring_ctx {
>>>>            struct io_rings        *rings;
>>>>            struct percpu_ref    refs;
>>>>    +        u64            poll_available_time;
>>>> +
>>>>            clockid_t        clockid;
>>>>            enum tk_offsets        clock_offset;
>>>>    @@ -433,9 +435,6 @@ struct io_ring_ctx {
>>>>        struct page            **sqe_pages;
>>>>          struct page            **cq_wait_page;
>>>> -
>>>> -    /* for io_uring hybrid poll*/
>>>> -    u64            available_time;
>>>>    };
>>>>      struct io_tw_state {
>>>> @@ -647,9 +646,22 @@ struct io_kiocb {
>>>>          atomic_t            refs;
>>>>        bool                cancel_seq_set;
>>>> +    bool                poll_state;
>>>
>>> As mentioned briefly before, that can be just a req->flags flag
>>
>> That'd be even better, I generally despise random bool addition.
>>
>>>>        struct io_task_work        io_task_work;
>>>> -    /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
>>>> -    struct hlist_node        hash_node;
>>>> +    union {
>>>> +        /*
>>>> +         * for polled requests, i.e. IORING_OP_POLL_ADD and async armed
>>>> +         * poll
>>>> +         */
>>>> +        struct hlist_node    hash_node;
>>>> +        /*
>>>> +         * For IOPOLL setup queues, with hybrid polling
>>>> +         */
>>>> +        struct {
>>>> +            u64        iopoll_start;
>>>> +            u64        iopoll_end;
>>>
>>> And IIRC it doesn't need to store the end as it's used immediately
>>> after it's set in the same function.
>>
>> Nice, that opens up the door for less esoteric sharing as well. And
>> yeah, I'd just use:
>>
>> runtime = ktime_get_ns() - req->iopoll_start - sleep_time;
>>
>> in io_uring_hybrid_poll() and kill it entirely, doesn't even need a
>> local variable there. And then shove iopoll_start into the union with
>> comp_list/apoll_events.
> 
> That's with what the current request is hooked into the list,
> IOW such aliasing will corrupt the request

Ah true, well some other spot then, should be pretty easy to find 8
bytes for iopoll_start. As mentioned, the point is really just to THINK
about where it should go, rather than lazily just shove it at the end
like no thought has been given to it.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ