[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YKe+H4F5I/L/+K8M@hirez.programming.kicks-ass.net>
Date: Fri, 21 May 2021 16:05:19 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Odin Ugedal <odin@...d.al>
Cc: Huaixin Chang <changhuaixin@...ux.alibaba.com>,
Benjamin Segall <bsegall@...gle.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
dtcccc@...ux.alibaba.com, Juri Lelli <juri.lelli@...hat.com>,
khlebnikov@...dex-team.ru,
open list <linux-kernel@...r.kernel.org>,
Mel Gorman <mgorman@...e.de>, Ingo Molnar <mingo@...hat.com>,
pauld@...head.com, Paul Turner <pjt@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>,
shanpeic@...ux.alibaba.com, Tejun Heo <tj@...nel.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
xiyou.wangcong@...il.com
Subject: Re: [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics
On Thu, May 20, 2021 at 04:11:52PM +0200, Odin Ugedal wrote:
> I am a bit sceptical about both the nr_burst and burst_time as they are now.
>
> As an example; a control group using "99.9%" of the quota each period
> and that is never throttled. Such group would with this patch with a burst of X
> still get nr_throttled = 0 (as before), but it would get a nr_burst
> and burst_time that
> will keep increasing.
>
> I think there is a big difference between runtime moved/taken from
> cfs_b->runtime to cfs_rq->runtime_remaining and the actual runtime used
> in the period. Currently, cfs bw can only supply info the first one, and
> not the latter.
>
> I think that if people see nr_burst increasing, that they think they _have_
> to use cfs burst in order to avoid being throttled, even though that might
> not be the case. It is probably fine as is, as long as it is explicitly stated
> what the values mean and imply, and what they do not. I cannot see another
> way to calculate it as it is now, but maybe someone else has some thoughts.
You can always trace the system. I don't think we have nice tracepoints
for any of this, but much can be inferred from the scheduler and hrtimer
tracepoints. Also kprobe might be empoloyed to stick in more appropriate
thingies I suppose.
You can also run the workload without bandwidth controls and measure
it's job execution times, and from that compute the bandwidth settings,
all without tracepoints.
Powered by blists - more mailing lists