linux-kernel - Re: [PATCH 0/5] sched/fair: Rework EAS to handle more cases

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBjc+x0_8uGEp=f8nJxP0t2Z86kpjSEBY8HdDfa44ZF3g@mail.gmail.com>
Date: Fri, 8 Nov 2024 10:27:00 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Pierre Gondois <pierre.gondois@....com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, vschneid@...hat.com, lukasz.luba@....com, 
	rafael.j.wysocki@...el.com, linux-kernel@...r.kernel.org, qyousef@...alina.io, 
	hongyan.xia2@....com
Subject: Re: [PATCH 0/5] sched/fair: Rework EAS to handle more cases

Hi Pierre,

On Thu, 7 Nov 2024 at 11:14, Pierre Gondois <pierre.gondois@....com> wrote:
>
> Hello Vincent,
> Related to feec(), but not to this patchset, I think there might be a
> concurrency issue while running feec().

yes, this is a know limitation

>
> Feec() doesn't have any locking mechanism. This means that multiple CPUs
> might run the function at the same time.

this is done on purpose as we don't want to lock and slow down the wakeup path

> If:
> - 2 tasks with approximately the same utilization wake up at the same time
> - some space on an energy efficient CPU is available
> feec() will likely select the same target for the 2 tasks.

yes

>
> Once feec() determined a target for a task, util signals are updated in
> enqueue_task_fair(). The delta between running feec() <-> enqueue_task_fair()
> is ~20us (on a Pixel6). This is not much, but this still allows some other

20us is quite long. this is the worst case on little core lowest freq ?

> CPUs to run feec() util signals that will be wrong in no time.
>
> Note that it is also possible for one CPU to run feec() for 2 different tasks,
> decide to migrate the 2 tasks to another target CPU, and then start enqueueing
> the tasks. Meaning one single CPU will run feec() using util signals it knows
> are wrong.

isn't this case serialized because cpu selection for next task will
happen after enqueuing the 1st one

>
> The issue is problematic as it creates some instability. Once a
> 'parallel selection' is done, the following scenarios can happen:
> - the system goes overutilized, and EAS is disabled
> - a frequency spike happen to handle the unexpected load.
>    Then the perf. domain becomes less energy efficient compared to other
>    perf. domains, and tasks are migrated out of this perf. domain
>
> I made the following prototype to avoid 'parallel selections'. The goal here
> is to tag CPUs that are under pending migration.
> A target CPU is tagged as 'eas_pending_enqueue' at the end of feec(). Other
> CPUs should therefore not consider this CPU as valid candidate.
>
> The implementation is a bit raw, but it gives some good results. Using rt-app
> workloads, and trying not to have tasks waking up at the same timing during
> the whole test:
> Workload1:
> N tasks with a period of 16ms and a util of 4/8. Each task starts with a
> 4ms delay. Each workload lasts 20s and is run over 5 iterations.
>
> Workload2:
> N tasks with a period of (8 +n)ms and a util of 4/8. I.e. the first task
> has a period of 8ms, the second task a period of 9ms, etc. Each workload lasts
> 20s and is run over 5 iterations.
>
> Are presented:
> - the measured energy consumed, according to the Pixel6 energy meters
> - the estimated energy consumed, lisa uses the util signals along with
>    the CPU frequencies and the Energy Model to do an estimation.
> - the amount of time spent in the overutilized state, in percentage.
>
> ------
>
> Workload1:
>
> Measured energy:
> +------+-------+--------------+--------------+------------+
> | util | count | without      | with         | ratio      |
> +------+-------+--------------+--------------+------------+
> | 4    | 8     | 3220.970324  | 3312.097508  | 2.829184   |
> | 4    | 12    | 5942.486726  | 5016.106047  | -15.589108 |
> | 4    | 16    | 10412.26692  | 10017.633658 | -3.79008   |
> | 8    | 8     | 7524.271751  | 7479.451427  | -0.595677  |
> | 8    | 12    | 14782.214144 | 14567.282266 | -1.45399   |
> | 8    | 16    | 21452.863497 | 19561.143385 | -8.818031  |
> +------+-------+--------------+--------------+------------+
> Std:
> +------+-------+-------------+-------------+
> | util | count | without     | with        |
> +------+-------+-------------+-------------+
> | 4    | 8     | 165.563394  | 48.903514   |
> | 4    | 12    | 518.609612  | 81.170952   |
> | 4    | 16    | 329.729882  | 192.739245  |
> | 8    | 8     | 105.144497  | 336.796522  |
> | 8    | 12    | 384.615323  | 339.86986   |
> | 8    | 16    | 1252.735561 | 2563.268952 |
> +------+-------+-------------+-------------+
>
> Estimated energy:
> +------+-------+-----------+-----------+------------+
> | util | count | without   | with      | ratio      |
> +------+-------+-----------+-----------+------------+
> | 4    | 8     | 1.4372e10 | 1.2791e10 | -11.000273 |
> | 4    | 12    | 3.1881e10 | 2.3743e10 | -25.526193 |
> | 4    | 16    | 5.7663e10 | 5.4079e10 | -6.215679  |
> | 8    | 8     | 2.5622e10 | 2.5337e10 | -1.109823  |
> | 8    | 12    | 6.4332e10 | 6.9335e10 | 7.776814   | [1]
> | 8    | 16    | 9.5285e10 | 8.2331e10 | -13.594508 |
> +------+-------+-----------+-----------+------------+
> Std:
> +------+-------+----------+-----------+
> | util | count | without  | with      |
> +------+-------+----------+-----------+
> | 4    | 8     | 1.3896e9 | 5.4265e8  |
> | 4    | 12    | 4.7511e9 | 5.1521e8  |
> | 4    | 16    | 3.5486e9 | 1.2625e9  |
> | 8    | 8     | 3.0033e8 | 2.3168e9  |
> | 8    | 12    | 8.7739e9 | 3.0743e9  |
> | 8    | 16    | 6.7982e9 | 2.2393e10 |
> +------+-------+----------+-----------+
>
> Overutilized ratio (in % of the 20s test):
> +------+-------+-----------+-----------+------------+
> | util | count | without   | with      | ratio      |
> +------+-------+-----------+-----------+------------+
> | 4    | 8     | 0.187941  | 0.015834  | -91.575158 |
> | 4    | 12    | 0.543073  | 0.045483  | -91.624815 |
> | 4    | 16    | 8.510734  | 8.389077  | -1.429448  |
> | 8    | 8     | 1.056678  | 0.876095  | -17.089643 |
> | 8    | 12    | 36.457757 | 9.260862  | -74.598378 | [1]
> | 8    | 16    | 72.327933 | 78.693558 | 8.801061   |
> +------+-------+-----------+-----------+------------+
> Std:
> +------+-------+-----------+-----------+
> | util | count | without   | with      |
> +------+-------+-----------+-----------+
> | 4    | 8     | 0.232077  | 0.016531  |
> | 4    | 12    | 0.338637  | 0.040252  |
> | 4    | 16    | 0.729743  | 6.368214  |
> | 8    | 8     | 1.702964  | 1.722589  |
> | 8    | 12    | 34.436278 | 17.314564 |
> | 8    | 16    | 14.540217 | 33.77831  |
> +------+-------+-----------+-----------+
>
> ------
>
> Workload2:
>
> Measured energy:
> +------+-------+--------------+--------------+-----------+
> | util | count | without      | with         | ratio     |
> +------+-------+--------------+--------------+-----------+
> | 4    | 8     | 3357.578785  | 3324.890715  | -0.973561 |
> | 4    | 12    | 5024.573746  | 4903.394533  | -2.411731 |
> | 4    | 16    | 10114.715431 | 9762.803821  | -3.479204 |
> | 8    | 8     | 7485.230678  | 6961.782086  | -6.993086 |
> | 8    | 12    | 13720.482516 | 13374.765825 | -2.519712 |
> | 8    | 16    | 24846.806317 | 24444.012805 | -1.621108 |
> +------+-------+--------------+--------------+-----------+
> Std:
> +------+-------+------------+------------+
> | util | count | without    | with       |
> +------+-------+------------+------------+
> | 4    | 8     | 87.450628  | 76.955783  |
> | 4    | 12    | 106.062839 | 116.882891 |
> | 4    | 16    | 182.525881 | 172.819307 |
> | 8    | 8     | 874.292359 | 162.790237 |
> | 8    | 12    | 151.830636 | 339.286741 |
> | 8    | 16    | 904.751446 | 154.419644 |
> +------+-------+------------+------------+
>
> Estimated energy:
> +------+-------+-----------+-----------+------------+
> | util | count | without   | with      | ratio      |
> +------+-------+-----------+-----------+------------+
> | 4    | 8     | 1.4778e10 | 1.4805e10 | 0.184658   |
> | 4    | 12    | 2.6105e10 | 2.5485e10 | -2.374486  |
> | 4    | 16    | 5.8394e10 | 5.7177e10 | -2.083208  |
> | 8    | 8     | 3.0275e10 | 2.5973e10 | -14.211178 |
> | 8    | 12    | 7.0616e10 | 6.9085e10 | -2.168347  |
> | 8    | 16    | 1.3133e11 | 1.2891e11 | -1.839725  |
> +------+-------+-----------+-----------+------------+
> Std:
> +------+-------+----------+----------+
> | util | count | without  | with     |
> +------+-------+----------+----------+
> | 4    | 8     | 3.5449e8 | 8.2454e8 |
> | 4    | 12    | 9.4248e8 | 1.1364e9 |
> | 4    | 16    | 8.3240e8 | 1.2084e9 |
> | 8    | 8     | 9.0364e9 | 5.0381e8 |
> | 8    | 12    | 9.9112e8 | 3.0836e9 |
> | 8    | 16    | 4.9429e8 | 1.9533e9 |
> +------+-------+----------+----------+
>
> Overutilized ratio (in % of the 20s test):
> +------+-------+-----------+----------+------------+
> | util | count | without   | with     | ratio      |
> +------+-------+-----------+----------+------------+
> | 4    | 8     | 0.154992  | 0.049429 | -68.108419 |
> | 4    | 12    | 0.132593  | 0.061762 | -53.420202 |
> | 4    | 16    | 6.798091  | 4.606102 | -32.244179 |
> | 8    | 8     | 1.360703  | 0.174626 | -87.166465 |
> | 8    | 12    | 0.519704  | 0.250469 | -51.805502 |
> | 8    | 16    | 12.114269 | 8.969281 | -25.961019 |
> +------+-------+-----------+----------+------------+
> Std:
> +------+-------+----------+----------+
> | util | count | without  | with     |
> +------+-------+----------+----------+
> | 4    | 8     | 0.212919 | 0.036856 |
> | 4    | 12    | 0.069696 | 0.060257 |
> | 4    | 16    | 0.63995  | 0.542028 |
> | 8    | 8     | 2.158079 | 0.211775 |
> | 8    | 12    | 0.089159 | 0.187436 |
> | 8    | 16    | 0.798565 | 1.669003 |
> +------+-------+----------+----------+
>
> ------
>
> Analysis:
>
> - [1]
> Without the patch, 2 tasks end up on one little CPU. This consumes
> less energy than using the medium/big CPU according to the energy model,
> but EAS should not be capable of doing such task placement as the little
> CPU becomes overutilized.
> Without the patch, the system is overutilized a lot more than with the patch.
>
> -
> Looking at the overutilized ratio, being overutilized 0.5% of the time or
> 0.05% of the time might seem close, but it means that EAS ended up
> doing a bad task placement multiple, independent times.
>
> -
> The overutilized ratio should be checked along the energy results as it
> shows how much EAS was involved in the task placement.
>
> -
> Overall, the energy consumed is less. The quantity of energy saved varies
> with the workload.
>
> ------
>
> On another note, I wanted to ask if there would be a v2 of this present
> patchset (sched/fair: Rework EAS to handle more cases),

yes, I have been side tracked by other stuff since LPC and haven't
been able to finalize test on v2 but it's ongoing

>
> Regards,
> Pierre
>
> ------
>
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index bb343136ddd0..812d5bf88875 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1592,6 +1592,12 @@ struct task_struct {
>          struct user_event_mm            *user_event_mm;
>   #endif
>
> +       /*
> +        * Keep track of the CPU feec() migrated this task to.
> +        * There is a per-cpu 'eas_pending_enqueue' value to reset.
> +        */
> +       int eas_target_cpu;
> +
>          /*
>           * New fields for task_struct should be added above here, so that
>           * they are included in the randomized portion of task_struct.
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c157d4860a3b..34911eb059cf 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6945,6 +6945,8 @@ requeue_delayed_entity(struct sched_entity *se)
>          se->sched_delayed = 0;
>   }
>
> +DEFINE_PER_CPU(atomic_t, eas_pending_enqueue);
> +
>   /*
>    * The enqueue_task method is called before nr_running is
>    * increased. Here we update the fair scheduling stats and
> @@ -7064,6 +7066,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>                  check_update_overutilized_status(rq);
>
>   enqueue_throttle:
> +       if (p->eas_target_cpu != -1) {
> +               atomic_set(&per_cpu(eas_pending_enqueue, p->eas_target_cpu), 0);
> +               p->eas_target_cpu = -1;
> +       }
> +
>          assert_list_leaf_cfs_rq(rq);
>
>          hrtick_update(rq);
> @@ -8451,6 +8458,11 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>                          if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>                                  continue;
>
> +                       /* Skip this CPU as its util signal will be invalid soon. */
> +                       if (atomic_read(&per_cpu(eas_pending_enqueue, cpu)) &&
> +                           cpu != prev_cpu)
> +                               continue;
> +
>                          util = cpu_util(cpu, p, cpu, 0);
>                          cpu_cap = capacity_of(cpu);
>
> @@ -8560,6 +8572,17 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>              ((best_fits < 0) && (best_actual_cap > prev_actual_cap)))
>                  target = best_energy_cpu;
>
> +       /*
> +        *'Lock' the target CPU if there is a migration. Prevent other feec()
> +        * calls to use the same target CPU until util signals are not updated.
> +        */
> +       if (prev_cpu != target) {
> +               if (!atomic_cmpxchg_acquire(&per_cpu(eas_pending_enqueue, target), 0, 1))
> +                       p->eas_target_cpu = target;
> +               else
> +                       target = prev_cpu;
> +       }
> +
>          return target;
>
>   unlock: