[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFpoUr2mNO87XFAyHF=HA3f6KC8EkuGrwQQe54q4kmF1WgfG7w@mail.gmail.com>
Date: Thu, 20 May 2021 16:00:29 +0200
From: Odin Ugedal <odin@...d.al>
To: Huaixin Chang <changhuaixin@...ux.alibaba.com>
Cc: Benjamin Segall <bsegall@...gle.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
dtcccc@...ux.alibaba.com, Juri Lelli <juri.lelli@...hat.com>,
khlebnikov@...dex-team.ru,
open list <linux-kernel@...r.kernel.org>,
Mel Gorman <mgorman@...e.de>, Ingo Molnar <mingo@...hat.com>,
Odin Ugedal <odin@...d.al>, pauld@...head.com,
Peter Zijlstra <peterz@...radead.org>,
Paul Turner <pjt@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>,
shanpeic@...ux.alibaba.com, Tejun Heo <tj@...nel.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
xiyou.wangcong@...il.com
Subject: Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller
Hi,
Here are some more thoughts and questions:
> The benefit of burst is seen when testing with schbench:
>
> echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs
> echo 600000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us
> echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_period_us
> echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us
>
> # The average CPU usage is around 500%, which is 200ms CPU time
> # every 40ms.
> ./schbench -m 1 -t 30 -r 10 -c 10000 -R 500
>
> Without burst:
>
> Latency percentiles (usec)
> 50.0000th: 7
> 75.0000th: 8
> 90.0000th: 9
> 95.0000th: 10
> *99.0000th: 933
> 99.5000th: 981
> 99.9000th: 3068
> min=0, max=20054
> rps: 498.31 p95 (usec) 10 p99 (usec) 933 p95/cputime 0.10% p99/cputime 9.33%
It should be noted that this was running on a 64 core machine (if that was
the case, ref. your previous patch).
I am curious how much you have tried tweaking both the period and the quota
for this workload. I assume a longer period can help such bursty application,
and from the small slowdowns, a slightly higher quota could also help
I guess. I am
not saying this is a bad idea, but that we need to understand what it
fixes, and how,
in order to be able to understand how/if to use it.
Also, what value of the sysctl kernel.sched_cfs_bandwidth_slice_us are
you using?
What CONFIG_HZ you are using is also interesting, due to how bw is
accounted for.
There is some more info about it here: Documentation/scheduler/sched-bwc.rst. I
assume a smaller slice value may also help, and it would be interesting to see
what implications it gives. A high threads to (quota/period) ratio, together
with a high bandwidth_slice will probably cause some throttling, so one has
to choose between precision and overhead.
Also, here you give a burst of 66% the quota. Would that be a typical value
for a cgroup, or is it just a result of testing? As I understand this
patchset, your example
would allow 600% constant CPU load, then one period with 1000% load,
then another
"long set" of periods with 600% load. Have you discussed a way of limiting how
long burst can be "saved" before expiring?
> @@ -9427,7 +9478,8 @@ static int cpu_max_show(struct seq_file *sf, void *v)
> {
> struct task_group *tg = css_tg(seq_css(sf));
>
> - cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg));
> + cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg),
> + tg_get_cfs_burst(tg));
> return 0;
> }
The current cgroup v2 docs say the following:
> cpu.max
> A read-write two value file which exists on non-root cgroups.
> The default is "max 100000".
This will become a "three value file", and I know a few user space projects
who parse this file by splitting on the middle space. I am not sure if they are
"wrong", but I don't think we usually break such things. Not sure what
Tejun thinks about this.
Thanks
Odin
Powered by blists - more mailing lists