linux-kernel - Re: [PATCH v6 2/3] sched/fair: Add cfs bandwidth burst statistics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <8D470B39-00F8-4C87-92B8-BA645639AB24@linux.alibaba.com>
Date:   Fri, 2 Jul 2021 19:31:54 +0800
From:   changhuaixin <changhuaixin@...ux.alibaba.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     changhuaixin <changhuaixin@...ux.alibaba.com>,
        luca.abeni@...tannapisa.it, anderson@...unc.edu, baruah@...tl.edu,
        Benjamin Segall <bsegall@...gle.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        dtcccc@...ux.alibaba.com, Juri Lelli <juri.lelli@...hat.com>,
        khlebnikov@...dex-team.ru,
        open list <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...e.de>, Ingo Molnar <mingo@...hat.com>,
        Odin Ugedal <odin@...d.al>, Odin Ugedal <odin@...dal.com>,
        pauld@...head.com, Paul Turner <pjt@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Shanpei Chen <shanpeic@...ux.alibaba.com>,
        Tejun Heo <tj@...nel.org>, tommaso.cucinotta@...tannapisa.it,
        Vincent Guittot <vincent.guittot@...aro.org>,
        xiyou.wangcong@...il.com
Subject: Re: [PATCH v6 2/3] sched/fair: Add cfs bandwidth burst statistics



> On Jun 28, 2021, at 11:00 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Mon, Jun 21, 2021 at 05:27:59PM +0800, Huaixin Chang wrote:
>> The following statistics in cpu.stat file is added to show how much workload
>> is making use of cfs_b burst:
>> 
>> nr_bursts:  number of periods bandwidth burst occurs
>> burst_usec: cumulative wall-time that any cpus has
>> 	    used above quota in respective periods
>> 
>> The larger nr_bursts is, the more bursty periods there are. And the larger
>> burst_usec is, the more burst time is used by bursty workload.
> 
> That's what it does, but fails to explain why. How is this number
> useful.
> 

How about this？

The cfs_b burst feature avoids throttling by allowing bandwidth bursts. When using cfs_b
burst, users configure burst and see if it helps from workload latency and cfs_b interval
statistics like nr_throttled. Also two new statistics are introduced to show the internal of burst featrue
and explain why burst helps or not:

	nr_bursts:    number of periods bandwidth burst occurs
	burst_usec: cumulative wall-time that any cpus has
			    used above quota in respective periods


>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 53d7cc4d009b..62b73722e510 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4634,11 +4634,22 @@ static inline u64 sched_cfs_bandwidth_slice(void)
>>  */
>> void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>> {
>> +	u64 runtime;
>> +
>> 	if (unlikely(cfs_b->quota == RUNTIME_INF))
>> 		return;
>> 
>> +	if (cfs_b->runtime_at_period_start > cfs_b->runtime) {
>> +		runtime = cfs_b->runtime_at_period_start - cfs_b->runtime;
> 
> That comparison is the same as the subtraction; might as well write
> this:
> 
>> +		if (runtime > cfs_b->quota) {
>> +			cfs_b->burst_time += runtime - cfs_b->quota;
> 
> Same here.
> 
>> +			cfs_b->nr_burst++;
>> +		}
>> +	}
> 
> 
> Perhaps we can write that like:
> 
> 	s64 runtime = cfs_b->runtime_snapshot - cfs_b->runtime;
> 	if (runtime > 0) {
> 		s64 burstime = runtime - cfs_q->quota;
> 		if (burstime > 0) {
> 			cfs_b->bust_time += bursttime;
> 			cfs_b->nr_bursts++;
> 		}
> 	}
> 
> I was hoping we could get away with something simpler, like maybe:
> 

Got it.

> 	u64 old_runtim = cfs_b->runtime;
> 
> 	cfs_b->runtime += cfs_b->quota
> 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> 
> 	if (cfs_b->runtime - old_runtime > cfs_b->quota)
> 		cfs_b->nr_bursts++;
> 
> Would that be good enough?
> 
> 
>> +
>> 	cfs_b->runtime += cfs_b->quota;
>> 	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
>> +	cfs_b->runtime_at_period_start = cfs_b->runtime;
>> }
>> 
>> static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index d317ca74a48c..b770b553dfbb 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -367,6 +367,7 @@ struct cfs_bandwidth {
>> 	u64			quota;
>> 	u64			runtime;
>> 	u64			burst;
>> +	u64			runtime_at_period_start;
>> 	s64			hierarchical_quota;
> 
> As per the above, I don't really like that name, runtime_snapshot or
> perhaps runtime_snap is shorter and not less clear. But not having it at
> all would be even better.