[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c15b2d54-c722-8fb4-266f-b589c1a21aa5@gmail.com>
Date: Mon, 23 Sep 2019 19:21:51 +0300
From: Pavel Begunkov <asml.silence@...il.com>
To: Ingo Molnar <mingo@...nel.org>, Jens Axboe <axboe@...nel.dk>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/2] Optimise io_uring completion waiting
Hi, and thanks for the feedback.
It could be done with @cond indeed, that's how it works for now.
However, this addresses performance issues only.
The problem with wait_event_*() is that, if we have a counter and are
trying to wake up tasks after each increment, it would schedule each
waiting task O(threshold) times just for it to spuriously check @cond
and go back to sleep. All that overhead (memory barriers, registers
save/load, accounting, etc) turned out to be enough for some workloads
to slow down the system.
With this specialisation it still traverses a wait list and makes
indirect calls to the checker callback, but the list supposedly is
fairly small, so performance there shouldn't be a problem, at least for
now.
Regarding semantics; It should wake a task when a value passed to
wake_up_threshold() is greater or equal then a task's threshold, that is
specified individually for each task in wait_threshold_*().
In pseudo code:
```
def wake_up_threshold(n, wait_queue):
for waiter in wait_queue:
waiter.wake_up_if(n >= waiter.threshold);
```
Any thoughts how to do it better? Ideas are very welcome.
BTW, this monster is mostly a copy-paste from wait_event_*(),
wait_bit_*(). We could try to extract some common parts from these
three, but that's another topic.
On 23/09/2019 11:35, Ingo Molnar wrote:
>
> * Jens Axboe <axboe@...nel.dk> wrote:
>
>> On 9/22/19 2:08 AM, Pavel Begunkov (Silence) wrote:
>>> From: Pavel Begunkov <asml.silence@...il.com>
>>>
>>> There could be a lot of overhead within generic wait_event_*() used for
>>> waiting for large number of completions. The patchset removes much of
>>> it by using custom wait event (wait_threshold).
>>>
>>> Synthetic test showed ~40% performance boost. (see patch 2)
>>
>> I'm fine with the io_uring side of things, but to queue this up we
>> really need Peter or Ingo to sign off on the core wakeup bits...
>>
>> Peter?
>
> I'm not sure an extension is needed for such a special interface, why not
> just put a ->threshold value next to the ctx->wait field and use either
> the regular wait_event() APIs with the proper condition, or
> wait_event_cmd() style APIs if you absolutely need something more complex
> to happen inside?
>
> Should result in a much lower linecount and no scheduler changes. :-)
>
> Thanks,
>
> Ingo
>
--
Yours sincerely,
Pavel Begunkov
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists