linux-kernel - Re: [PATCH v2 0/2] Optimise io

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a307986b-94aa-f26f-fc5b-d0865083d060@gmail.com>
Date:   Tue, 24 Sep 2019 21:28:37 +0300
From:   Pavel Begunkov <asml.silence@...il.com>
To:     Jens Axboe <axboe@...nel.dk>, Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/2] Optimise io_uring completion waiting

On 24/09/2019 20:46, Jens Axboe wrote:
> On 9/24/19 11:33 AM, Pavel Begunkov wrote:
>> On 24/09/2019 16:13, Jens Axboe wrote:
>>> On 9/24/19 5:23 AM, Pavel Begunkov wrote:
>>>>> Yep that should do it, and saves 8 bytes of stack as well.
>>>>>
>>>>> BTW, did you test my patch, this one or the previous? Just curious if it
>>>>> worked for you.
>>>>>
>>>> Not yet, going to do that tonight
>>>
>>> Thanks! For reference, the final version is below. There was still a
>>> signal mishap in there, now it should all be correct afaict.
>>>
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 9b84232e5cc4..d2a86164d520 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -2768,6 +2768,38 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit,
>>>   	return submit;
>>>   }
>>>   
>>> +struct io_wait_queue {
>>> +	struct wait_queue_entry wq;
>>> +	struct io_ring_ctx *ctx;
>>> +	unsigned to_wait;
>>> +	unsigned nr_timeouts;
>>> +};
>>> +
>>> +static inline bool io_should_wake(struct io_wait_queue *iowq)
>>> +{
>>> +	struct io_ring_ctx *ctx = iowq->ctx;
>>> +
>>> +	/*
>>> +	 * Wake up if we have enough events, or if a timeout occured since we
>>> +	 * started waiting. For timeouts, we always want to return to userspace,
>>> +	 * regardless of event count.
>>> +	 */
>>> +	return io_cqring_events(ctx->rings) >= iowq->to_wait ||
>>> +			atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
>>> +}
>>> +
>>> +static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
>>> +			    int wake_flags, void *key)
>>> +{
>>> +	struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue,
>>> +							wq);
>>> +
>>> +	if (!io_should_wake(iowq))
>>> +		return -1;
>>
>> It would try to schedule only the first task in the wait list. Is that the
>> semantic you want?
>> E.g. for waiters=[32,8] and nr_events == 8, io_wake_function() returns
>> after @32, and won't wake up the second one.
> 
> Right, those are the semantics I want. We keep the list ordered by using
> the exclusive wait addition. Which means that for the case you list,
> waiters=32 came first, and we should not wake others before that task
> gets the completions it wants. Otherwise we could potentially starve
> higher count waiters, if we always keep going and new waiters come in.
> 
Yes. I think It would better to be documented in userspace API. I
could imagine some crazy case deadlocking userspace. E.g. 
thread 1: wait_events(8), reap_events
thread 2: wait_events(32), wait(thread 1), reap_events

works well
Reviewed-by: Pavel Begunkov <asml.silence@...il.com>
Tested-by: Pavel Begunkov <asml.silence@...il.com>

BTW, I searched for wait_event*(), and it seems there are plenty of
similar use cases. So, generic case would be useful, but this is for
later.



-- 
Yours sincerely,
Pavel Begunkov



Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)