lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <cover.1615517202.git.brookxu@tencent.com>
Date:   Fri, 12 Mar 2021 19:08:34 +0800
From:   brookxu <brookxu.cn@...il.com>
To:     paolo.valente@...aro.org, axboe@...nel.dk, tj@...nel.org
Cc:     linux-block@...r.kernel.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: [RFC PATCH v2 00/11] bfq: introduce bfq.ioprio for cgroup

From: Chunguang Xu <brookxu@...cent.com>

Tasks in the production environment can be roughly divided into
three categories: emergency tasks, ordinary tasks and offline
tasks. Emergency tasks need to be scheduled in real time, such
as system agents. Offline tasks do not need to guarantee QoS,
but can improve system resource utilization during system idle
periods, such as background tasks. The above requirements need
to achieve IO preemption. At present, we can use weights to
simulate IO preemption, but since weights are more of a shared
concept, they cannot be simulated well. For example, the weights
of emergency tasks and ordinary tasks cannot be determined well,
offline tasks (with the same weight) actually occupy different
resources on disks with different performance, and the tail
latency caused by offline tasks cannot be well controlled. Using
ioprio's concept of preemption, we can solve the above problems
very well. Since ioprio will eventually be converted to weight,
using ioprio alone can also achieve weight isolation within the
same class. But we can still use bfq.weight to control resource,
achieving better IO Qos control.

However, currently the class of bfq_group is always be class, and
the ioprio class of the task can only be reflected in a single
cgroup. We cannot guarantee that real-time tasks in a cgroup are
scheduled in time. Therefore, we introduce bfq.ioprio, which
allows us to configure ioprio class for cgroup. In this way, we
can ensure that the real-time tasks of a cgroup can be scheduled
in time. Similarly, the processing of offline task groups can
also be simpler.

The bfq.ioprio interface now is available for cgroup v1 and cgroup
v2. Users can configure the ioprio for cgroup through this interface,
as shown below:

echo "1 2"> blkio.bfq.ioprio

The above two values respectively represent the values of ioprio
class and ioprio for cgroup. The ioprio of tasks within the cgroup
is uniformly equal to the ioprio of the cgroup. If the ioprio of
the cgroup is disabled, the ioprio of the task remains the same,
usually from io_context.

When testing, using fio and fio_generate_plots we can clearly see
that the IO delay of the task satisfies RT> BE> IDLE. When RT is
running, BE and IDLE are guaranteed minimum bandwidth. When used
with bfq.weight, we can also isolate the resource within the same
class.

The test process is as follows:
# prepare data disk
mount /dev/sdb /data1

# create cgroup v1 hierarchy
cd /sys/fs/cgroup/blkio
mkdir rt be idle
echo "1 0" > rt/blkio.bfq.ioprio
echo "2 0" > be/blkio.bfq.ioprio
echo "3 0" > idle/blkio.bfq.ioprio

# run fio test
fio fio.ini

# generate svg graph
fio_generate_plots res

The contents of fio.ini are as follows:
[global]
ioengine=libaio
group_reporting=1
log_avg_msec=500
direct=1
time_based=1
iodepth=16
size=100M
rw=write
bs=1M
[rt]
name=rt
write_bw_log=rt
write_lat_log=rt
write_iops_log=rt
filename=/data1/rt.bin
cgroup=rt
runtime=30s
nice=-10
[be]
name=be
new_group
write_bw_log=be
write_lat_log=be
write_iops_log=be
filename=/data1/be.bin
cgroup=be
runtime=60s
[idle]
name=idle
new_group
write_bw_log=idle
write_lat_log=idle
write_iops_log=idle
filename=/data1/idle.bin
cgroup=idle
runtime=90s

V2:
1. Optmise bfq_select_next_class().
2. Introduce bfq_group [] to track the number of groups for each CLASS.
3. Optimse IO injection, EMQ and Idle mechanism for CLASS_RT.

Chunguang Xu (11):
  bfq: introduce bfq_entity_to_bfqg helper method
  bfq: limit the IO depth of idle_class to 1
  bfq: keep the minimun bandwidth for be_class
  bfq: expire other class if CLASS_RT is waiting
  bfq: optimse IO injection for CLASS_RT
  bfq: disallow idle if CLASS_RT waiting for service
  bfq: disallow merge CLASS_RT with other class
  bfq: introduce bfq.ioprio for cgroup
  bfq: convert the type of bfq_group.bfqd to bfq_data*
  bfq: remove unnecessary initialization logic
  bfq: optimize the calculation of bfq_weight_to_ioprio()

 block/bfq-cgroup.c  |  99 +++++++++++++++++++++++++++++++----
 block/bfq-iosched.c |  47 ++++++++++++++---
 block/bfq-iosched.h |  28 ++++++++--
 block/bfq-wf2q.c    | 124 +++++++++++++++++++++++++++++++++-----------
 4 files changed, 244 insertions(+), 54 deletions(-)

-- 
2.30.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ