lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Nov 2023 16:14:50 +0000
From:   Pavel Begunkov <asml.silence@...il.com>
To:     Xiaobing Li <xiaobing.li@...sung.com>, axboe@...nel.dk
Cc:     linux-kernel@...r.kernel.org, io-uring@...r.kernel.org,
        kun.dou@...sung.com, peiwei.li@...sung.com, joshi.k@...sung.com,
        kundan.kumar@...sung.com, wenwen.chen@...sung.com,
        ruyi.zhang@...sung.com
Subject: Re: [PATCH v2] io_uring: Statistics of the true utilization of sq
 threads.

On 11/8/23 08:07, Xiaobing Li wrote:
> Since the sq thread has a while(1) structure, during this process, there
> may be a lot of time that is not processing IO but does not exceed the
> timeout period, therefore, the sqpoll thread will keep running and will
> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
> goal is to count the part of the time that the sqpoll thread actually
> processes IO, so as to reflect the part of the CPU it uses to process
> IO, which can be used to help improve the actual utilization of the CPU
> in the future.

Let's pull the elephant out of the room, what's the use case? "Improve
in the future" doesn't sound too convincing. If it's a future kernel
change you have in mind, it has to go together with this patch. If it's
a userspace application, it'd be interesting to hear what that is,
especially if you have numbers ready.

And another classic question, why can't it be done with bpf?


> Signed-off-by: Xiaobing Li <xiaobing.li@...sung.com>
> 
> v1 -> v2: Added method to query data.
> 
...
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index bd6c2c7959a5..c821273406bd 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -224,6 +224,7 @@ static int io_sq_thread(void *data)
>   	struct io_ring_ctx *ctx;
>   	unsigned long timeout = 0;
>   	char buf[TASK_COMM_LEN];
> +	unsigned long start, begin, end;

start and begin used for just slightly different accounting,
it'll get confused anyone.

>   	DEFINE_WAIT(wait);
>   
>   	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
> @@ -235,6 +236,7 @@ static int io_sq_thread(void *data)
>   		set_cpus_allowed_ptr(current, cpu_online_mask);
>   
>   	mutex_lock(&sqd->lock);
> +	start = jiffies;
>   	while (1) {
>   		bool cap_entries, sqt_spin = false;
>   
> @@ -245,12 +247,18 @@ static int io_sq_thread(void *data)
>   		}
>   
>   		cap_entries = !list_is_singular(&sqd->ctx_list);
> +		begin = jiffies;

There can be {hard,soft}irq in between jiffies reads, and it can even
be scheduled out in favour of another process, so it'd collect a lot
of garbage. There should be a per-task stat for system time you can
use:

start = get_system_time(current);
do_io_part();
sq->total_time += get_system_time(current) - start;
wait();

...

>   		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
>   			int ret = __io_sq_thread(ctx, cap_entries);
>   
>   			if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
>   				sqt_spin = true;
>   		}
> +		end = jiffies;
> +		sqd->total = end - start;

...and then you don't need to track total at all, it'd be your

total = get_system_time(sq_thread /* current */);

at any given point it time.


> +		if (sqt_spin == true)
> +			sqd->work += end - begin;

It should go after the io_run_task_work() below, task_work is a major
part of request execution.

> +
>   		if (io_run_task_work())
>   			sqt_spin = true;
>   
> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
> index 8df37e8c9149..0aa4e2efa4db 100644
> --- a/io_uring/sqpoll.h
> +++ b/io_uring/sqpoll.h
> @@ -16,6 +16,8 @@ struct io_sq_data {
>   	pid_t			task_pid;
>   	pid_t			task_tgid;
>   
> +	unsigned long       work;
> +	unsigned long       total;
>   	unsigned long		state;
>   	struct completion	exited;
>   };

-- 
Pavel Begunkov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ