linux-kernel - Re: [PATCH] eventfd: support delayed wakeup for non-semaphore eventfd to reduce cpu utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <tencent_DC538304754DFEA48B3167F9F13E8B548D0A@qq.com>
Date:   Fri, 5 May 2023 00:01:13 +0800
From:   Wen Yang <wenyang.linux@...mail.com>
To:     Jens Axboe <axboe@...nel.dk>,
        Christian Brauner <brauner@...nel.org>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        Christoph Hellwig <hch@....de>, Dylan Yudaken <dylany@...com>,
        David Woodhouse <dwmw@...zon.co.uk>,
        Paolo Bonzini <pbonzini@...hat.com>, Fu Wei <wefu@...hat.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] eventfd: support delayed wakeup for non-semaphore eventfd
 to reduce cpu utilization


在 2023/4/21 01:44, Wen Yang 写道:
>
> 在 2023/4/20 00:42, Jens Axboe 写道:
>> On 4/19/23 3:12?AM, Christian Brauner wrote:
>>> On Tue, Apr 18, 2023 at 08:15:03PM -0600, Jens Axboe wrote:
>>>> On 4/17/23 10:32?AM, Wen Yang wrote:
>>>>> ? 2023/4/17 22:38, Jens Axboe ??:
>>>>>> On 4/16/23 5:31?AM, wenyang.linux@...mail.com wrote:
>>>>>>> From: Wen Yang <wenyang.linux@...mail.com>
>>>>>>>
>>>>>>> For the NON SEMAPHORE eventfd, if it's counter has a nonzero value,
>>>>>>> then a read(2) returns 8 bytes containing that value, and the 
>>>>>>> counter's
>>>>>>> value is reset to zero. Therefore, in the NON SEMAPHORE scenario,
>>>>>>> N event_writes vs ONE event_read is possible.
>>>>>>>
>>>>>>> However, the current implementation wakes up the read thread 
>>>>>>> immediately
>>>>>>> in eventfd_write so that the cpu utilization increases 
>>>>>>> unnecessarily.
>>>>>>>
>>>>>>> By adding a configurable delay after eventfd_write, these 
>>>>>>> unnecessary
>>>>>>> wakeup operations are avoided, thereby reducing cpu utilization.
>>>>>> What's the real world use case of this, and what would the expected
>>>>>> delay be there? With using a delayed work item for this, there's
>>>>>> certainly a pretty wide grey zone in terms of delay where this would
>>>>>> perform considerably worse than not doing any delayed wakeups at 
>>>>>> all.
>>>>>
>>>>> Thanks for your comments.
>>>>>
>>>>> We have found that the CPU usage of the message middleware is high in
>>>>> our environment, because sensor messages from MCU are very frequent
>>>>> and constantly reported, possibly several hundred thousand times per
>>>>> second. As a result, the message receiving thread is frequently
>>>>> awakened to process short messages.
>>>>>
>>>>> The following is the simplified test code:
>>>>> https://github.com/w-simon/tests/blob/master/src/test.c
>>>>>
>>>>> And the test code in this patch is further simplified.
>>>>>
>>>>> Finally, only a configuration item has been added here, allowing 
>>>>> users
>>>>> to make more choices.
>>>> I think you'd have a higher chance of getting this in if the delay
>>>> setting was per eventfd context, rather than a global thing.
>>> That patch seems really weird. Is that an established paradigm to
>>> address problems like this through a configured wakeup delay? Because
>>> naively this looks like a pretty brutal hack.
>> It is odd, and it is a brutal hack. My worries were outlined in an
>> earlier reply, there's quite a big gap where no delay would be better
>> and the delay approach would be miserable because it'd cause extra
>> latency and extra context switches. It'd be much cleaner if you KNEW
>> there'd be more events coming, as you could then get rid of that delayed
>> work item completely. And I suspect, if this patch makes sense, that
>> it'd be better to have a number+time limit as well and if you hit the
>> event number count that you'd notify inline and put some smarts in the
>> delayed work handling to just not do anything if nothing is pending.
>
> Thank you very much for your suggestion.
>
> We will improve the implementation according to your suggestion and 
> send the v2 later.
>
>
Hi Jens, Christian,

Based on your valuable suggestions and inspiration from TCP's 
/Delayed ACK/ technology, we have reimplemented v2 and are currently 
testing it.

After several days of testing, we will send it again.

Thanks.


--

Best wishes,

Wen