linux-kernel - Re: [PATCH v6 1/3] sched/fair: Introduce the burstable CFS controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YNHjZqbtzoOy8w87@hirez.programming.kicks-ass.net>
Date:   Tue, 22 Jun 2021 15:19:34 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Huaixin Chang <changhuaixin@...ux.alibaba.com>
Cc:     luca.abeni@...tannapisa.it, anderson@...unc.edu, baruah@...tl.edu,
        bsegall@...gle.com, dietmar.eggemann@....com,
        dtcccc@...ux.alibaba.com, juri.lelli@...hat.com,
        khlebnikov@...dex-team.ru, linux-kernel@...r.kernel.org,
        mgorman@...e.de, mingo@...hat.com, odin@...d.al, odin@...dal.com,
        pauld@...head.com, pjt@...gle.com, rostedt@...dmis.org,
        shanpeic@...ux.alibaba.com, tj@...nel.org,
        tommaso.cucinotta@...tannapisa.it, vincent.guittot@...aro.org,
        xiyou.wangcong@...il.com
Subject: Re: [PATCH v6 1/3] sched/fair: Introduce the burstable CFS controller

On Mon, Jun 21, 2021 at 05:27:58PM +0800, Huaixin Chang wrote:
> The CFS bandwidth controller limits CPU requests of a task group to
> quota during each period. However, parallel workloads might be bursty
> so that they get throttled even when their average utilization is under
> quota. And they are latency sensitive at the same time so that
> throttling them is undesired.
> 
> We borrow time now against our future underrun, at the cost of increased
> interference against the other system users. All nicely bounded.
> 
> Traditional (UP-EDF) bandwidth control is something like:
> 
>   (U = \Sum u_i) <= 1
> 
> This guaranteeds both that every deadline is met and that the system is
> stable. After all, if U were > 1, then for every second of walltime,
> we'd have to run more than a second of program time, and obviously miss
> our deadline, but the next deadline will be further out still, there is
> never time to catch up, unbounded fail.
> 
> This work observes that a workload doesn't always executes the full
> quota; this enables one to describe u_i as a statistical distribution.
> 
> For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100)
> (the traditional WCET). This effectively allows u to be smaller,
> increasing the efficiency (we can pack more tasks in the system), but at
> the cost of missing deadlines when all the odds line up. However, it
> does maintain stability, since every overrun must be paired with an
> underrun as long as our x is above the average.
> 
> That is, suppose we have 2 tasks, both specify a p(95) value, then we
> have a p(95)*p(95) = 90.25% chance both tasks are within their quota and
> everything is good. At the same time we have a p(5)p(5) = 0.25% chance
> both tasks will exceed their quota at the same time (guaranteed deadline
> fail). Somewhere in between there's a threshold where one exceeds and
> the other doesn't underrun enough to compensate; this depends on the
> specific CDFs.
> 
> At the same time, we can say that the worst case deadline miss, will be
> \Sum e_i; that is, there is a bounded tardiness (under the assumption
> that x+e is indeed WCET).
> 
> The benefit of burst is seen when testing with schbench. Default value of
> kernel.sched_cfs_bandwidth_slice_us(5ms) and CONFIG_HZ(1000) is used.
> 
> 	mkdir /sys/fs/cgroup/cpu/test
> 	echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs
> 	echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us
> 	echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us
> 
> 	./schbench -m 1 -t 3 -r 20 -c 80000 -R 10
> 
> The average CPU usage is at 80%. I run this for 10 times, and got long tail
> latency for 6 times and got throttled for 8 times.
> 
> Tail latencies are shown below, and it wasn't the worst case.
> 
> 	Latency percentiles (usec)
> 		50.0000th: 19872
> 		75.0000th: 21344
> 		90.0000th: 22176
> 		95.0000th: 22496
> 		*99.0000th: 22752
> 		99.5000th: 22752
> 		99.9000th: 22752
> 		min=0, max=22727
> 	rps: 9.90 p95 (usec) 22496 p99 (usec) 22752 p95/cputime 28.12% p99/cputime 28.44%
> 
> The interferenece when using burst is valued by the possibilities for
> missing the deadline and the average WCET. Test results showed that when
> there many cgroups or CPU is under utilized, the interference is
> limited. More details are shown in:
> https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/
> 
> Co-developed-by: Shanpei Chen <shanpeic@...ux.alibaba.com>
> Signed-off-by: Shanpei Chen <shanpeic@...ux.alibaba.com>
> Co-developed-by: Tianchen Ding <dtcccc@...ux.alibaba.com>
> Signed-off-by: Tianchen Ding <dtcccc@...ux.alibaba.com>
> Signed-off-by: Huaixin Chang <changhuaixin@...ux.alibaba.com>
> ---

Ben, what say you? I'm tempted to pick up at least this first patch.