lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26e38785-4f48-4801-a8c1-895bf8d78f7a@gmail.com>
Date: Fri, 19 Apr 2024 13:27:45 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: hexue <xue01.he@...sung.com>, axboe@...nel.dk
Cc: linux-kernel@...r.kernel.org, io-uring@...r.kernel.org,
 peiwei.li@...sung.com, joshi.k@...sung.com, kundan.kumar@...sung.com,
 anuj20.g@...sung.com, wenwen.chen@...sung.com, ruyi.zhang@...sung.com,
 xiaobing.li@...sung.com, cliang01.li@...sung.com
Subject: Re: [PATCH v2] io_uring: releasing CPU resources when polling

On 4/18/24 10:31, hexue wrote:
> This patch is intended to release the CPU resources of io_uring in
> polling mode. When IO is issued, the program immediately polls for
> check completion, which is a waste of CPU resources when IO commands
> are executed on the disk.
> 
> I add the hybrid polling feature in io_uring, enables polling to
> release a portion of CPU resources without affecting block layer.

So that's basically the block layer hybrid polling, which, to
remind, was removed not that long ago, but moved into io_uring.

> - Record the running time and context switching time of each
>    IO, and use these time to determine whether a process continue
>    to schedule.
> 
> - Adaptive adjustment to different devices. Due to the real-time
>    nature of time recording, each device's IO processing speed is
>    different, so the CPU optimization effect will vary.
> 
> - Set a interface (ctx->flag) enables application to choose whether
>    or not to use this feature.
> 
> The CPU optimization in peak workload of patch is tested as follows:
>    all CPU utilization of original polling is 100% for per CPU, after
>    optimization, the CPU utilization drop a lot (per CPU);

The first version was about cases that don't have iopoll queues.
How many IO poll queues did you have to get these numbers?


>     read(128k, QD64, 1Job)     37%   write(128k, QD64, 1Job)     40%
>     randread(4k, QD64, 16Job)  52%   randwrite(4k, QD64, 16Job)  12%
> 
>    Compared to original polling, the optimised performance reduction
>    with peak workload within 1%.
> 
>     read  0.29%     write  0.51%    randread  0.09%    randwrite  0%
> 
> Reviewed-by: KANCHAN JOSHI <joshi.k@...sung.com>

Kanchan, did you _really_ take a look at the patch?

> Signed-off-by: hexue <xue01.he@...sung.com>
> ---
>   include/linux/io_uring_types.h | 10 +++++
>   include/uapi/linux/io_uring.h  |  1 +
>   io_uring/io_uring.c            | 28 +++++++++++++-
>   io_uring/io_uring.h            |  2 +
>   io_uring/rw.c                  | 69 ++++++++++++++++++++++++++++++++++
>   5 files changed, 109 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 854ad67a5f70..7607fd8de91c 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -224,6 +224,11 @@ struct io_alloc_cache {
>   	size_t			elem_size;
>   };


-- 
Pavel Begunkov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ