linux-kernel - Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e4b472c9-ad8b-4423-81ad-02a1ab231f95@arm.com>
Date: Thu, 23 May 2024 10:06:04 +0100
From: Luis Machado <luis.machado@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
 linux-kernel@...r.kernel.org, kprateek.nayak@....com,
 wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de, nd
 <nd@....com>, John Stultz <jstultz@...gle.com>, Hongyan.Xia2@....com
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

Peter,

On 5/23/24 09:45, Peter Zijlstra wrote:
> On Mon, Apr 29, 2024 at 03:33:04PM +0100, Luis Machado wrote:
> 
>> (2) m6.6-eevdf-complete: m6.6-stock plus this series.
>> (3) m6.6-eevdf-complete-no-delay-dequeue: (2) + NO_DELAY_DEQUEUE
> 
>> +------------+------------------------------------------------------+-----------+
>> |  cluster   |                         tag                          | perc_diff |
>> +------------+------------------------------------------------------+-----------+
>> |    CPU     |                   m6.6-stock                         |   0.0%    |
>> |  CPU-Big   |                   m6.6-stock                         |   0.0%    |
>> | CPU-Little |                   m6.6-stock                         |   0.0%    |
>> |  CPU-Mid   |                   m6.6-stock                         |   0.0%    |
>> |    GPU     |                   m6.6-stock                         |   0.0%    |
>> |   Total    |                   m6.6-stock                         |   0.0%    |
> 
>> |    CPU     |        m6.6-eevdf-complete-no-delay-dequeue          |  117.77%  |
>> |  CPU-Big   |        m6.6-eevdf-complete-no-delay-dequeue          |  113.79%  |
>> | CPU-Little |        m6.6-eevdf-complete-no-delay-dequeue          |  97.47%   |
>> |  CPU-Mid   |        m6.6-eevdf-complete-no-delay-dequeue          |  189.0%   |
>> |    GPU     |        m6.6-eevdf-complete-no-delay-dequeue          |  -6.74%   |
>> |   Total    |        m6.6-eevdf-complete-no-delay-dequeue          |  103.84%  |
> 
> This one is still flummoxing me. I've gone over the patch a few times on
> different days and I'm not seeing it. Without DELAY_DEQUEUE it should
> behave as before.
> 
> Let me try and split this patch up into smaller parts such that you can
> try and bisect this.
> 

Same situation on my end. I've been chasing this for some time and I don't fully
understand why things go off the rails energy-wise as soon as DELAY_DEQUEUE is
enabled, now that the load_avg accounting red herring is gone.

I do have one additional piece of information though. Hopefully it will be useful.

Booting the kernel with NO_DELAY_DEQUEUE (default to false), things work fine. Then
if I switch to DELAY_DEQUEUE at runtime, things start using a lot more power.

The interesting bit is if I switch to NO_DELAY_DEQUEUE again at runtime, things don't
go back to normal. Rather they stay the same, using a lot more energy.

I wonder if we're leaving some unbalanced state somewhere while DELAY_DEQUEUE is on,
something that is signalling we have more load/utilization than we actually do.

The PELT signals look reasonable from what I can see. We don't seem to be boosting
frequencies, but we're running things mostly on big cores with DELAY_DEQUEUE on.

I'll keep investigating this. Please let me know if you need some additional data or
testing and I can get that going.