[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAooR2Qqko6Cfe=hQiTqJEzHX0j5uydza-SxdGDTsoGYA@mail.gmail.com>
Date: Wed, 25 Sep 2024 15:28:12 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Pierre Gondois <pierre.gondois@....com>
Cc: Christian Loehle <christian.loehle@....com>, qyousef@...alina.io, hongyan.xia2@....com,
mingo@...hat.com, peterz@...radead.org, linux-kernel@...r.kernel.org,
rafael.j.wysocki@...el.com, lukasz.luba@....com, vschneid@...hat.com,
mgorman@...e.de, bsegall@...gle.com, rostedt@...dmis.org,
dietmar.eggemann@....com, juri.lelli@...hat.com
Subject: Re: [RFC PATCH 4/5] sched/fair: Use EAS also when overutilized
On Thu, 19 Sept 2024 at 10:26, Pierre Gondois <pierre.gondois@....com> wrote:
>
> Hello Vincent,
> I tried this patch on a Pixel 6 (8 CPUs, 4 little, 2 mid, 2 big)
> with patches 1-4/5 using these workloads:
> ---
> A.
> a. 8 tasks at 2%/5%/10% during 1s
> b. 1 task:
> - sleeping during 0.3s
> - at 100% during 0.3s
> - sleeping during 0.3s
>
> b. is used to reach the overutilized state during a limited amount of time.
> EAS is then operating, then the load balancer does the task placement, then EAS
> is operating again.
>
> B.
> a. 8 tasks at 2%/5%/10% during 1s
> b. 1 task:
> - at 100% during 1s
>
> ---
> I'm seeing the energy consumption increase in some cases. This seems to be
> due to feec() migrating tasks more often than what the load balancer does
> for this workload. This leads to utilization 'spikes' and then frequency
> 'spikes', increasing the overall energy consumption.
> This is not entirely related to this patch though,, as the task placement seems
> correct. I.e. feec() effectively does an optimal placement given the EM and
> task utilization. The task placement is just a bit less stable than with
> the load balancer.
Would patch 5 help to keep things better placed ? in particular if
task b is misplaced at some point because of load balance ?
I agree that load balance might still contribute to migrate task in a
not energy efficient way
>
> ---
> Regarding hackbench, I've reproduced the test you've run on the same Pixel6.
> I have CONFIG_SCHED_CLUSTER enabled, which allows having a sched domain for
> each little/mid/big CPUs (without the config, these group would no exist).
Why did you do this ? All cpus are expected to be in same sched domain
as they share their LLC
>
> I see an important regression in the result.
> I replaced the condition to run feec() through select_task_rq_fair() by:
> if (get_rd_overloaded(cpu_rq(cpu)->rd) == 0)) {
overloaded is enable when more than 1 task runs on a cpu whatever the
utilization
> new_cpu = find_energy_efficient_cpu(p, prev_cpu);
> ...
> }
> and obtained better results.
>
> Indeed, for such intensive workload:
> - EAS would not have any energy benefit, so better prioritize performance
> (as Christian mentioned)
> - EAS would not be able to find a fitting CPU, so running feec() should be
> avoided
> - as you mention in the commit message, shuffling tasks when one CPU becomes
> momentarily overutilized is inefficient energy-wise (even though I don't have
> the numbers, it should make sense).
> So detecting when the system is overloaded should be a better compromise I
> assume. The condition in sched_balance_find_src_group() to let the load balancer
> operate might also need to be updated.
>
> Note:
> - base: with patches 1-4/5
> - _ou: run feec() when not overutilized
> - _ol: run feec() when not overloaded
> - mean: hackbench execution time in s.
> - delta: negative is better. Value is in percentage.
Could you share your command line ? As explained in the cover letter I
have seen some perf regressions but not in the range that you have
below
What is your base ? tip/sched/core ?
> ┌─────┬───────────┬──────────┬─────────┬──────────┬─────────┬──────────┬──────────┬──────────┐
> │ id ┆ mean_base ┆ std_base ┆ mean_ou ┆ std_ou ┆ mean_ol ┆ std_ol ┆ delta_ou ┆ delta_ol │
> ╞═════╪═══════════╪══════════╪═════════╪══════════╪═════════╪══════════╪══════════╪══════════╡
> │ 1 ┆ 1.9786 ┆ 0.04719 ┆ 3.0856 ┆ 0.122209 ┆ 2.1734 ┆ 0.045203 ┆ 55.95 ┆ 9.85 │
> │ 2 ┆ 1.8991 ┆ 0.019768 ┆ 2.6672 ┆ 0.135266 ┆ 1.98875 ┆ 0.055132 ┆ 40.45 ┆ 4.72 │
> │ 3 ┆ 1.9053 ┆ 0.014795 ┆ 2.5761 ┆ 0.141693 ┆ 2.06425 ┆ 0.045901 ┆ 35.21 ┆ 8.34 │
> │ 4 ┆ 1.9586 ┆ 0.023439 ┆ 2.5823 ┆ 0.110399 ┆ 2.0955 ┆ 0.053818 ┆ 31.84 ┆ 6.99 │
> │ 5 ┆ 1.746 ┆ 0.055676 ┆ 3.3437 ┆ 0.279107 ┆ 1.88 ┆ 0.038184 ┆ 91.51 ┆ 7.67 │
> │ 6 ┆ 1.5476 ┆ 0.050131 ┆ 2.6835 ┆ 0.140497 ┆ 1.5645 ┆ 0.081644 ┆ 73.4 ┆ 1.09 │
> │ 7 ┆ 1.4562 ┆ 0.062457 ┆ 2.3568 ┆ 0.119213 ┆ 1.48425 ┆ 0.06212 ┆ 61.85 ┆ 1.93 │
> │ 8 ┆ 1.3554 ┆ 0.031757 ┆ 2.0609 ┆ 0.112869 ┆ 1.4085 ┆ 0.036601 ┆ 52.05 ┆ 3.92 │
> │ 9 ┆ 2.0391 ┆ 0.035732 ┆ 3.4045 ┆ 0.277307 ┆ 2.2155 ┆ 0.019053 ┆ 66.96 ┆ 8.65 │
> │ 10 ┆ 1.9247 ┆ 0.056472 ┆ 2.6605 ┆ 0.119417 ┆ 2.02775 ┆ 0.05795 ┆ 38.23 ┆ 5.35 │
> │ 11 ┆ 1.8923 ┆ 0.038222 ┆ 2.8113 ┆ 0.120623 ┆ 2.089 ┆ 0.025259 ┆ 48.57 ┆ 10.39 │
> │ 12 ┆ 1.9444 ┆ 0.034856 ┆ 2.6675 ┆ 0.219585 ┆ 2.1035 ┆ 0.076514 ┆ 37.19 ┆ 8.18 │
> │ 13 ┆ 1.7107 ┆ 0.04874 ┆ 3.4443 ┆ 0.154481 ┆ 1.8275 ┆ 0.036665 ┆ 101.34 ┆ 6.83 │
> │ 14 ┆ 1.5565 ┆ 0.056595 ┆ 2.8241 ┆ 0.158643 ┆ 1.5515 ┆ 0.040813 ┆ 81.44 ┆ -0.32 │
> │ 15 ┆ 1.4932 ┆ 0.085256 ┆ 2.6841 ┆ 0.135623 ┆ 1.50475 ┆ 0.028336 ┆ 79.75 ┆ 0.77 │
> │ 16 ┆ 1.4263 ┆ 0.067666 ┆ 2.3971 ┆ 0.145928 ┆ 1.414 ┆ 0.061422 ┆ 68.06 ┆ -0.86 │
> └─────┴───────────┴──────────┴─────────┴──────────┴─────────┴──────────┴──────────┴──────────┘
>
> On 9/17/24 22:24, Christian Loehle wrote:
> > On 8/30/24 14:03, Vincent Guittot wrote:
> >> Keep looking for an energy efficient CPU even when the system is
> >> overutilized and use the CPU returned by feec() if it has been able to find
> >> one. Otherwise fallback to the default performance and spread mode of the
> >> scheduler.
> >> A system can become overutilized for a short time when workers of a
> >> workqueue wake up for a short background work like vmstat update.
> >> Continuing to look for a energy efficient CPU will prevent to break the
> >> power packing of tasks.
> >>
> >> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> >> ---
> >> kernel/sched/fair.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 2273eecf6086..e46af2416159 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -8505,7 +8505,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> >> cpumask_test_cpu(cpu, p->cpus_ptr))
> >> return cpu;
> >>
> >> - if (!is_rd_overutilized(this_rq()->rd)) {
> >> + if (sched_energy_enabled()) {
> >> new_cpu = find_energy_efficient_cpu(p, prev_cpu);
> >> if (new_cpu >= 0)
> >> return new_cpu;
> >
> > Super quick testing on pixel6:
> > for i in $(seq 0 6); do /data/local/tmp/hackbench -l 500 -g 100 | grep Time; sleep 60; done
> > with patch 5/5 only:
> > Time: 19.433
> > Time: 19.657
> > Time: 19.851
> > Time: 19.789
> > Time: 19.857
> > Time: 20.092
> > Time: 19.973
> >
> > mainline:
> > Time: 18.836
> > Time: 18.718
> > Time: 18.781
> > Time: 19.015
> > Time: 19.061
> > Time: 18.950
> > Time: 19.166
> >
> >
> > The reason we didn't always have this enabled is the belief that
> > this costs us too much performance in scenarios we most need it
> > while at best making subpar EAS decisions anyway (in an
> > overutilized state).
> > I'd be open for questioning that, but why the change of mind?
> > And why is this necessary in your series if the EAS selection
> > isn't 'final' (until the next sleep) anymore (Patch 5/5)?
> >
Powered by blists - more mailing lists