[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230928022228.15770-1-xiaobing.li@samsung.com>
Date: Thu, 28 Sep 2023 10:22:25 +0800
From: Xiaobing Li <xiaobing.li@...sung.com>
To: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vschneid@...hat.com, axboe@...nel.dk,
asml.silence@...il.com
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
io-uring@...r.kernel.org, kun.dou@...sung.com,
peiwei.li@...sung.com, joshi.k@...sung.com,
kundan.kumar@...sung.com, wenwen.chen@...sung.com,
ruyi.zhang@...sung.com, Xiaobing Li <xiaobing.li@...sung.com>
Subject: [PATCH 0/3] Sq thread real utilization statistics.
Summary:
The current kernel's pelt scheduling algorithm is calculated based on
the running time of the thread. However, this algorithm may cause a
waste of CPU resources for some threads, such as the sq thread in
io_uring.
Since the sq thread has a while(1) structure, during this process, there
may be a lot of time when IO is not processed but the timeout period is
not exceeded, so the sqpoll thread will keep running, thus occupying the
CPU. Obviously, the CPU is wasted at this time.
our goal is to count the part of the time the sqpoll thread actually
processes IO, thereby reflecting the part of its CPU used to process IO,
which can be used to help improve the actual utilization of the CPU in
the future.
Modifications to the scheduling module are also applicable to other
threads with the same needs.
We use fio (version 3.28) to test the performance. In the experiments,
an fio process are viewed as an application, it starts job with sq_poll
enabled. The tests are performed on a host with 256 CPUs and 64G memory,
the IO tasks are performed on a PM1743 SSD, and the OS is Ubuntu 22.04
with kernel version of 6.4.0.
Some parameters for sequential reading and writing are as follows:
bs=128k, numjobs=1, iodepth=64.
Some parameters for random reading and writing are as follows:
bs=4k, numjobs=16, iodepth=64.
The test results are as follows:
Before modification
read write randread randwrite
IOPS(K) 53.7 46.1 849 293
BW(MB/S) 7033 6037 3476 1199
After modification
read write randread randwrite
IOPS(K) 53.7 46.1 847 293
BW(MB/S) 7033 6042 3471 1199
It can be seen from the test results that my modifications have almost
no impact on performance.
Xiaobing Li (3):
SCHEDULER: Add an interface for counting real utilization.
PROC FILESYSTEM: Add real utilization data of sq thread.
IO_URING: Statistics of the true utilization of sq threads.
fs/proc/stat.c | 25 ++++++++++++++++++++++++-
include/linux/kernel.h | 7 ++++++-
include/linux/kernel_stat.h | 3 +++
include/linux/sched.h | 1 +
io_uring/sqpoll.c | 26 +++++++++++++++++++++++++-
kernel/sched/cputime.c | 36 +++++++++++++++++++++++++++++++++++-
kernel/sched/pelt.c | 14 ++++++++++++++
7 files changed, 108 insertions(+), 4 deletions(-)
--
2.34.1
Powered by blists - more mailing lists