linux-kernel - Re: [RFC PATCH v2 4/6] sched/fair: Introduce an energy estimation helper function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180425082311.GH14391@e108498-lin.cambridge.arm.com>
Date:   Wed, 25 Apr 2018 09:23:12 +0100
From:   Quentin Perret <quentin.perret@....com>
To:     Leo Yan <leo.yan@...aro.org>
Cc:     Dietmar Eggemann <dietmar.eggemann@....com>,
        linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Thara Gopinath <thara.gopinath@...aro.org>,
        linux-pm@...r.kernel.org,
        Morten Rasmussen <morten.rasmussen@....com>,
        Chris Redpath <chris.redpath@....com>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Valentin Schneider <valentin.schneider@....com>,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Todd Kjos <tkjos@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Steve Muckle <smuckle@...gle.com>,
        Eduardo Valentin <edubezval@...il.com>
Subject: Re: [RFC PATCH v2 4/6] sched/fair: Introduce an energy estimation
 helper function

Hi Leo,

Sorry for the delay in responding...

On Saturday 21 Apr 2018 at 00:27:53 (+0800), Leo Yan wrote:
> On Fri, Apr 20, 2018 at 03:42:45PM +0100, Quentin Perret wrote:
> > Hi Leo,
> > 
> > On Wednesday 18 Apr 2018 at 20:15:47 (+0800), Leo Yan wrote:
> > > Sorry I introduce mess at here to spread my questions in several
> > > replying, later will try to ask questions in one replying.  Below are
> > > more questions which it's good to bring up:
> > > 
> > > The code for energy computation is quite neat and simple, but I think
> > > the energy computation mixes two concepts for CPU util: one concept is
> > > the estimated CPU util which is used to select CPU OPP in schedutil,
> > > another concept is the raw CPU util according to CPU real running time;
> > > for example, cpu_util_next() predicts CPU util but this value might be
> > > much higher than cpu_util(), especially after enabled UTIL_EST feature
> > > (I have shallow understanding for UTIL_EST so correct me as needed);
> > 
> > I'm not not sure to understand what you mean by higher than cpu_util()
> > here ... In which case would that happen ?
> 
> After UTIL_EST feature is enabled, cpu_util_next() returns higher value
> than cpu_util(), see below code 'util = max(util, util_est);';  as
> result cpu_util_next() takes consideration for extra compensention
> introduced by UTIL_EST.
> 
> 	if (sched_feat(UTIL_EST)) {
> 	        util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
> 	        if (dst_cpu == cpu)
> 	                util_est += _task_util_est(p);
> 	        else
> 	                util_est = max_t(long, util_est - _task_util_est(p), 0);
> 	        util = max(util, util_est);
> 	}

So, cpu_util() accounts for the UTIL_EST compensation:

	static inline unsigned long cpu_util(int cpu)
	{
		struct cfs_rq *cfs_rq;
		unsigned int util;

		cfs_rq = &cpu_rq(cpu)->cfs;
		util = READ_ONCE(cfs_rq->avg.util_avg);

		if (sched_feat(UTIL_EST))
			util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued));

		return min_t(unsigned long, util, capacity_orig_of(cpu));
	}

So cpu_util_next() just mimics that.

> 
> > cpu_util_next() is basically used to figure out what will be the
> > cpu_util() of CPU A after task p has been enqueued on CPU B (no matter
> > what A and B are).
> 
> Same with upper description, cpu_util_next() is not the same thing
> with cpu_util(), cpu_util_next() takes consideration for extra
> compensention introduced by UTIL_EST.
> 
> > > but this patch simply computes CPU capacity and energy with the single
> > > one CPU utilization value (and it will be an inflated value afte enable
> > > UTIL_EST).  Is this purposed for simple implementation?
> > > 
> > > IMHO, cpu_util_next() can be used to predict CPU capacity, on the other
> > > hand, should we use the CPU util without UTIL_EST capping for 'sum_util',
> > > this can be more reasonable to reflect the CPU utilization?
> > 
> > Why would a decayed utilisation be a better estimate of the time that
> > a task is going to spend on a CPU ?
> 
> IIUC, in the scheduler waken up path task_util() is the task utilisation
> before task sleeping, so it's not a decayed value.

I don't think this is correct. sync_entity_load_avg() is called in
select_task_rq_fair() so task_util() *is* decayed upon wakeup.

> cpu_util() is
> decayed value,

This is not necessarily correct either. As mentioned above, cpu_util()
includes the UTIL_EST compensation, so the value isn't necessarily
decayed.

> but is this just we want to reflect cpu historic
> utilisation at the recent past time?  This is the reason I bring up to
> use 'cpu_util() + task_util()' as estimation.
> 
> I understand this patch tries to use pre-decayed value,

No, this patch tries to estimate what will be the return value of
cpu_util() if the task is enqueued on a specific CPU. This value can be
the util_avg (decayed) or the util_est (non-decayed) depending on the
conditions.

> please review
> below example has issue or not:
> if one CPU's cfs_rq->avg.util_est.enqueued is quite high value, then this
> CPU enter idle state and sleep for long while, if we use
> cfs_rq->avg.util_est.enqueued to estimate CPU utilisation, this might
> have big deviation than the CPU run time if place wake task on it?  On
> the other hand, cpu_util() can decay for CPU idle time...
> 
> > > Furthermore, if we consider RT thread is running on CPU and connect with
> > > 'schedutil' governor, the CPU will run at maximum frequency, but we
> > > cannot say the CPU has 100% utilization.  The RT thread case is not
> > > handled in this patch.
> > 
> > Right, we don't account for RT tasks in the OPP prediction for now.
> > Vincent's patches to have a util_avg for RT runqueues could help us
> > do that I suppose ...
> 
> Good to know this.
> 
> > Thanks !
> > Quentin