linux-kernel - Re: [RFC PATCH v3 09/10] sched/fair: Select an energy-efficient CPU on task wake-up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180608162612.GA17720@e108498-lin.cambridge.arm.com>
Date:   Fri, 8 Jun 2018 17:26:12 +0100
From:   Quentin Perret <quentin.perret@....com>
To:     Juri Lelli <juri.lelli@...hat.com>
Cc:     peterz@...radead.org, rjw@...ysocki.net,
        gregkh@...uxfoundation.org, linux-kernel@...r.kernel.org,
        linux-pm@...r.kernel.org, mingo@...hat.com,
        dietmar.eggemann@....com, morten.rasmussen@....com,
        chris.redpath@....com, patrick.bellasi@....com,
        valentin.schneider@....com, vincent.guittot@...aro.org,
        thara.gopinath@...aro.org, viresh.kumar@...aro.org,
        tkjos@...gle.com, joelaf@...gle.com, smuckle@...gle.com,
        adharmap@...cinc.com, skannan@...cinc.com, pkondeti@...eaurora.org,
        edubezval@...il.com, srinivas.pandruvada@...ux.intel.com,
        currojerez@...eup.net, javi.merino@...nel.org
Subject: Re: [RFC PATCH v3 09/10] sched/fair: Select an energy-efficient CPU
 on task wake-up

On Friday 08 Jun 2018 at 13:59:28 (+0200), Juri Lelli wrote:
> On 08/06/18 12:19, Quentin Perret wrote:
> > On Friday 08 Jun 2018 at 12:24:46 (+0200), Juri Lelli wrote:
> > > Hi,
> > > 
> > > On 21/05/18 15:25, Quentin Perret wrote:
> > > 
> > > [...]
> > > 
> > > > +static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> > > > +{
> > > > +	unsigned long cur_energy, prev_energy, best_energy, cpu_cap, task_util;
> > > > +	int cpu, best_energy_cpu = prev_cpu;
> > > > +	struct sched_energy_fd *sfd;
> > > > +	struct sched_domain *sd;
> > > > +
> > > > +	sync_entity_load_avg(&p->se);
> > > > +
> > > > +	task_util = task_util_est(p);
> > > > +	if (!task_util)
> > > > +		return prev_cpu;
> > > > +
> > > > +	/*
> > > > +	 * Energy-aware wake-up happens on the lowest sched_domain starting
> > > > +	 * from sd_ea spanning over this_cpu and prev_cpu.
> > > > +	 */
> > > > +	sd = rcu_dereference(*this_cpu_ptr(&sd_ea));
> > > > +	while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd)))
> > > > +		sd = sd->parent;
> > > > +	if (!sd)
> > > > +		return -1;
> > > 
> > > Shouldn't this be return prev_cpu?
> > 
> > Well, you shouldn't be entering this function without an sd_ea pointer,
> > so this case is a sort of bug I think. By returning -1 I think we should
> > end-up picking a CPU using select_fallback_rq(), which sort of makes
> > sense ?
> 
> I fear cpumask_test_cpu() and such won't be happy with a -1 arg.
> If it's a recoverable bug, I'd say return prev and WARN_ON_ONCE() ?

Hmmm, yes, prev + WARN_ON_ONCE is probably appropriate here then.

> 
> > > > +
> > > > +	if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed))
> > > > +		prev_energy = best_energy = compute_energy(p, prev_cpu);
> > > > +	else
> > > > +		prev_energy = best_energy = ULONG_MAX;
> > > > +
> > > > +	for_each_freq_domain(sfd) {
> > > > +		unsigned long spare_cap, max_spare_cap = 0;
> > > > +		int max_spare_cap_cpu = -1;
> > > > +		unsigned long util;
> > > > +
> > > > +		/* Find the CPU with the max spare cap in the freq. dom. */
> > > 
> > > I undestand this being a heuristic to cut some overhead, but shouldn't
> > > the model tell between packing vs. spreading?
> > 
> > Ah, that's a very interesting one :-) !
> > 
> > So, with only active costs of the CPUs in the model, we can't really
> > tell what's best between packing or spreading between identical CPUs if
> > the migration of the task doesn't change the OPP request.
> > 
> > In a frequency domain, all the "best" CPU candidates for a task are
> > those for which we'll request a low OPP. When there are several CPUs for
> > which the OPP request will be the same, we just don't know which one to
> > pick from an energy standpoint, because we don't have other energy costs
> > (for idle states for ex) to break the tie.
> > 
> > With this EM, the interesting thing is that if you assume that OPP
> > requests follow utilization, you are _guaranteed_ that the CPU with
> > the max spare capacity in a freq domain will always be among the best
> > candidates of this freq domain. And since we don't know how to
> > differentiate those candidates, why not using this one ?
> > 
> > Yes, it _might_ be better from an energy standpoint to pack small tasks
> > on a CPU in order to let other CPUs go in deeper idle states. But that
> > also hurts your chances to go cluster idle. Which solution is the best ?
> > It depends, and we have no ways to tell with this EM.
> > 
> > This approach basically favors cluster-packing, and spreading inside a
> > cluster. That should at least be a good thing for latency, and this is
> > consistent with the idea that most of the energy savings come from the
> > asymmetry of the system, and not so much from breaking the tie between
> > identical CPUs. That's also the reason why EAS is enabled only if your
> > system has SD_ASYM_CPUCAPACITY set, as we already discussed for patch
> > 05/10 :-).
> > 
> > Does that make sense ?
> 
> Yes, thanks for the explanation. It would probably make sense to copy
> and paste your text above somewhere in comment/doc for future ref.

OK, will do.

Thanks !
Quentin