[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e2b679c-fc1e-3d83-2303-e053f330a21a@gmail.com>
Date: Thu, 9 Nov 2023 16:14:50 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Xiaobing Li <xiaobing.li@...sung.com>, axboe@...nel.dk
Cc: linux-kernel@...r.kernel.org, io-uring@...r.kernel.org,
kun.dou@...sung.com, peiwei.li@...sung.com, joshi.k@...sung.com,
kundan.kumar@...sung.com, wenwen.chen@...sung.com,
ruyi.zhang@...sung.com
Subject: Re: [PATCH v2] io_uring: Statistics of the true utilization of sq
threads.
On 11/8/23 08:07, Xiaobing Li wrote:
> Since the sq thread has a while(1) structure, during this process, there
> may be a lot of time that is not processing IO but does not exceed the
> timeout period, therefore, the sqpoll thread will keep running and will
> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
> goal is to count the part of the time that the sqpoll thread actually
> processes IO, so as to reflect the part of the CPU it uses to process
> IO, which can be used to help improve the actual utilization of the CPU
> in the future.
Let's pull the elephant out of the room, what's the use case? "Improve
in the future" doesn't sound too convincing. If it's a future kernel
change you have in mind, it has to go together with this patch. If it's
a userspace application, it'd be interesting to hear what that is,
especially if you have numbers ready.
And another classic question, why can't it be done with bpf?
> Signed-off-by: Xiaobing Li <xiaobing.li@...sung.com>
>
> v1 -> v2: Added method to query data.
>
...
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index bd6c2c7959a5..c821273406bd 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -224,6 +224,7 @@ static int io_sq_thread(void *data)
> struct io_ring_ctx *ctx;
> unsigned long timeout = 0;
> char buf[TASK_COMM_LEN];
> + unsigned long start, begin, end;
start and begin used for just slightly different accounting,
it'll get confused anyone.
> DEFINE_WAIT(wait);
>
> snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
> @@ -235,6 +236,7 @@ static int io_sq_thread(void *data)
> set_cpus_allowed_ptr(current, cpu_online_mask);
>
> mutex_lock(&sqd->lock);
> + start = jiffies;
> while (1) {
> bool cap_entries, sqt_spin = false;
>
> @@ -245,12 +247,18 @@ static int io_sq_thread(void *data)
> }
>
> cap_entries = !list_is_singular(&sqd->ctx_list);
> + begin = jiffies;
There can be {hard,soft}irq in between jiffies reads, and it can even
be scheduled out in favour of another process, so it'd collect a lot
of garbage. There should be a per-task stat for system time you can
use:
start = get_system_time(current);
do_io_part();
sq->total_time += get_system_time(current) - start;
wait();
...
> list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
> int ret = __io_sq_thread(ctx, cap_entries);
>
> if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
> sqt_spin = true;
> }
> + end = jiffies;
> + sqd->total = end - start;
...and then you don't need to track total at all, it'd be your
total = get_system_time(sq_thread /* current */);
at any given point it time.
> + if (sqt_spin == true)
> + sqd->work += end - begin;
It should go after the io_run_task_work() below, task_work is a major
part of request execution.
> +
> if (io_run_task_work())
> sqt_spin = true;
>
> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
> index 8df37e8c9149..0aa4e2efa4db 100644
> --- a/io_uring/sqpoll.h
> +++ b/io_uring/sqpoll.h
> @@ -16,6 +16,8 @@ struct io_sq_data {
> pid_t task_pid;
> pid_t task_tgid;
>
> + unsigned long work;
> + unsigned long total;
> unsigned long state;
> struct completion exited;
> };
--
Pavel Begunkov
Powered by blists - more mailing lists