lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 12 Jun 2024 16:08:08 +0100
From: Luis Machado <luis.machado@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
 linux-kernel@...r.kernel.org, kprateek.nayak@....com,
 wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de,
 John Stultz <jstultz@...gle.com>, Hongyan.Xia2@....com
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

On 6/5/24 10:42, Peter Zijlstra wrote:
> On Wed, Jun 05, 2024 at 10:14:47AM +0100, Luis Machado wrote:
>> ... thanks for the patch! The above seems to do it for me. I can see
>> more reasonable energy use with the eevdf-complete series. Still a
>> bit higher. Might be noise, we'll see.
> 
> W00t!!!
> 
> Let me write a decent Changelog and stuff it in the git tree along with
> all the other bits.
> 
> Thanks for all the testing.

I've been doing some more testing of the eevdf-complete series with
the Pixel6/EAS platform. Hopefully these numbers will prove useful.

The energy regression from the original delayed-dequeue patch seems to
have been mostly fixed (with your proposed patch). Energy readings for
the big and mid cores are mostly stable and comparable with stock eevdf
(without eevdf-complete).

The only difference I can spot now is in the energy use of the little
cores. Compared to stock eevdf, the delayed-dequeue code seems to make
the energy use of the little cores a bit spiky, meaning sometimes we get
the expected level of energy use, but other times we get 40%, 60% or
even 90% more.

For instance...


(1) m6.6-eevdf-stock*: stock eevdf runs
(2) m6.6-eevdf-complete-ndd-dz*: eevdf-complete + NO_DELAY_DEQUEUE + DELAY_ZERO
(3) m6.6-eevdf-complete-dd-dz*: eevdf-complete + DELAY_DEQUEUE + DELAY_ZERO

+------------+---------------------------------+-----------+
|  channel   |                      tag        | perc_diff |
+------------+---------------------------------+-----------+
| CPU-Little |  m6.6-eevdf-stock-1             |   0.0%    |
| CPU-Little |  m6.6-eevdf-stock-2             |  -4.21%   |
| CPU-Little |  m6.6-eevdf-stock-3             |  -7.86%   |
| CPU-Little |  m6.6-eevdf-stock-4             |  -5.67%   |
| CPU-Little |  m6.6-eevdf-stock-5             |  -6.61%   |
| CPU-Little |  m6.6-eevdf-complete-ndd-dz-1   |  -2.21%   |
| CPU-Little |  m6.6-eevdf-complete-ndd-dz-2   |  -9.99%   |
| CPU-Little |  m6.6-eevdf-complete-ndd-dz-3   |   -6.1%   |
| CPU-Little |  m6.6-eevdf-complete-ndd-dz-4   |  -5.66%   |
| CPU-Little |  m6.6-eevdf-complete-ndd-dz-5   |  -7.12%   |
| CPU-Little |  m6.6-eevdf-complete-dd-dz-1    |  96.69%   |
| CPU-Little |  m6.6-eevdf-complete-dd-dz-2    |   22.1%   |
| CPU-Little |  m6.6-eevdf-complete-dd-dz-3    |  44.82%   |
| CPU-Little |  m6.6-eevdf-complete-dd-dz-4    |  -0.23%   |
| CPU-Little |  m6.6-eevdf-complete-dd-dz-5    |   8.28%   |
+------------+---------------------------------+-----------+

Looking at what might explain the spiky behavior with DELAY_DEQUEUE, I
noticed the idle residency data (we have 2 idle states) also shows some
spikyness and potential clues.

Looks like (1) and (2) manage to switch to idle states in a consistent
manner, whereas (3) seems a bit erratic and more prone to take a
shallower idle state (idle 0) as opposed to a deeper idle state (idle 1).

(1) and (2) seem to make better use of the deeper idle state.

+-------------------------------+---------+------------+-------+
|            tag                | cluster | idle_state | time  |
+-------------------------------+---------+------------+-------+
| m6.6-eevdf-stock-1            | little  |  not idle  | 63.49 |
| m6.6-eevdf-stock-1            | little  |   idle 0   | 30.66 |
| m6.6-eevdf-stock-1            | little  |   idle 1   | 12.15 |
| m6.6-eevdf-stock-2            | little  |  not idle  | 62.6  |
| m6.6-eevdf-stock-2            | little  |   idle 0   | 31.13 |
| m6.6-eevdf-stock-2            | little  |   idle 1   | 14.56 |
| m6.6-eevdf-stock-3            | little  |  not idle  | 63.98 |
| m6.6-eevdf-stock-3            | little  |   idle 0   | 31.54 |
| m6.6-eevdf-stock-3            | little  |   idle 1   | 15.91 |
| m6.6-eevdf-stock-4            | little  |  not idle  | 64.18 |
| m6.6-eevdf-stock-4            | little  |   idle 0   | 31.32 |
| m6.6-eevdf-stock-4            | little  |   idle 1   | 15.83 |
| m6.6-eevdf-stock-5            | little  |  not idle  | 63.32 |
| m6.6-eevdf-stock-5            | little  |   idle 0   | 30.4  |
| m6.6-eevdf-stock-5            | little  |   idle 1   | 14.33 |
| m6.6-eevdf-complete-ndd-dz-1  | little  |  not idle  | 62.62 |
| m6.6-eevdf-complete-ndd-dz-1  | little  |   idle 0   | 29.48 |
| m6.6-eevdf-complete-ndd-dz-1  | little  |   idle 1   | 13.19 |
| m6.6-eevdf-complete-ndd-dz-2  | little  |  not idle  | 64.12 |
| m6.6-eevdf-complete-ndd-dz-2  | little  |   idle 0   | 27.62 |
| m6.6-eevdf-complete-ndd-dz-2  | little  |   idle 1   | 14.73 |
| m6.6-eevdf-complete-ndd-dz-3  | little  |  not idle  | 62.86 |
| m6.6-eevdf-complete-ndd-dz-3  | little  |   idle 0   | 27.87 |
| m6.6-eevdf-complete-ndd-dz-3  | little  |   idle 1   | 14.97 |
| m6.6-eevdf-complete-ndd-dz-4  | little  |  not idle  | 63.01 |
| m6.6-eevdf-complete-ndd-dz-4  | little  |   idle 0   | 28.2  |
| m6.6-eevdf-complete-ndd-dz-4  | little  |   idle 1   | 14.11 |
| m6.6-eevdf-complete-ndd-dz-5  | little  |  not idle  | 62.1  |
| m6.6-eevdf-complete-ndd-dz-5  | little  |   idle 0   | 29.06 |
| m6.6-eevdf-complete-ndd-dz-5  | little  |   idle 1   | 14.73 |
| m6.6-eevdf-complete-dd-dz-1   | little  |  not idle  | 46.18 |
| m6.6-eevdf-complete-dd-dz-1   | little  |   idle 0   | 53.78 |
| m6.6-eevdf-complete-dd-dz-1   | little  |   idle 1   | 3.75  |
| m6.6-eevdf-complete-dd-dz-2   | little  |  not idle  | 57.64 |
| m6.6-eevdf-complete-dd-dz-2   | little  |   idle 0   | 40.47 |
| m6.6-eevdf-complete-dd-dz-2   | little  |   idle 1   | 7.39  |
| m6.6-eevdf-complete-dd-dz-3   | little  |  not idle  | 43.14 |
| m6.6-eevdf-complete-dd-dz-3   | little  |   idle 0   | 57.73 |
| m6.6-eevdf-complete-dd-dz-3   | little  |   idle 1   | 3.65  |
| m6.6-eevdf-complete-dd-dz-4   | little  |  not idle  | 58.97 |
| m6.6-eevdf-complete-dd-dz-4   | little  |   idle 0   | 36.4  |
| m6.6-eevdf-complete-dd-dz-4   | little  |   idle 1   | 9.42  |
| m6.6-eevdf-complete-dd-dz-5   | little  |  not idle  | 55.85 |
| m6.6-eevdf-complete-dd-dz-5   | little  |   idle 0   | 36.75 |
| m6.6-eevdf-complete-dd-dz-5   | little  |   idle 1   | 13.14 |
+-------------------------------+---------+------------+-------+

I can't draw a precise conclusion, but it might be down to delayed util_est
updates or even the additional time the delayed-dequeue tasks spend on the
runqueue. But delayed-dequeue does change the overall behavior a bit on
these heterogeneous platforms, energy-wise.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ