linux-kernel - Re: Re: [PATCH v2] io_uring: releasing CPU resources when polling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20240423032645.2546766-1-xue01.he@samsung.com>
Date: Tue, 23 Apr 2024 11:26:45 +0800
From: hexue <xue01.he@...sung.com>
To: axboe@...nel.dk
Cc: anuj20.g@...sung.com, asml.silence@...il.com, cliang01.li@...sung.com,
	io-uring@...r.kernel.org, joshi.k@...sung.com, kundan.kumar@...sung.com,
	linux-kernel@...r.kernel.org, peiwei.li@...sung.com, ruyi.zhang@...sung.com,
	wenwen.chen@...sung.com, xiaobing.li@...sung.com, xue01.he@...sung.com
Subject: Re: Re: [PATCH v2] io_uring: releasing CPU resources when polling

On 4/22/24 18:11, Jens Axboe wrote:
>On 4/18/24 3:31 AM, hexue wrote:
>> This patch is intended to release the CPU resources of io_uring in
>> polling mode. When IO is issued, the program immediately polls for
>> check completion, which is a waste of CPU resources when IO commands
>> are executed on the disk.
>> 
>> I add the hybrid polling feature in io_uring, enables polling to
>> release a portion of CPU resources without affecting block layer.
>> 
>> - Record the running time and context switching time of each
>>   IO, and use these time to determine whether a process continue
>>   to schedule.
>> 
>> - Adaptive adjustment to different devices. Due to the real-time
>>   nature of time recording, each device's IO processing speed is
>>   different, so the CPU optimization effect will vary.
>> 
>> - Set a interface (ctx->flag) enables application to choose whether
>>   or not to use this feature.
>> 
>> The CPU optimization in peak workload of patch is tested as follows:
>>   all CPU utilization of original polling is 100% for per CPU, after
>>   optimization, the CPU utilization drop a lot (per CPU);
>> 
>>    read(128k, QD64, 1Job)     37%   write(128k, QD64, 1Job)     40%
>>    randread(4k, QD64, 16Job)  52%   randwrite(4k, QD64, 16Job)  12%
>> 
>>   Compared to original polling, the optimised performance reduction
>>   with peak workload within 1%.
>> 
>>    read  0.29%     write  0.51%    randread  0.09%    randwrite  0%
>
>As mentioned, this is like a reworked version of the old hybrid polling
>we had. The feature itself may make sense, but there's a slew of things
>in this patch that aren't really acceptable. More below.

Thank you very much for your patience in reviewing and correcting, I will
improve those as soon as possible and submit the v3 patch later.