linux-kernel - Re: [RFC PATCH 0/7] sched: cpufreq: Remove magic margins

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a6365f63-4669-15e5-b843-f4bfb1bd5e68@arm.com>
Date:   Wed, 6 Sep 2023 10:18:15 +0100
From:   Lukasz Luba <lukasz.luba@....com>
To:     Qais Yousef <qyousef@...alina.io>
Cc:     linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PATCH 0/7] sched: cpufreq: Remove magic margins

Hi Qais,

On 8/28/23 00:31, Qais Yousef wrote:
> Since the introduction of EAS and schedutil, we had two magic 0.8 and 1.25
> margins applied in fits_capacity() and apply_dvfs_headroom().
> 
> As reported two years ago in
> 
> 	https://lore.kernel.org/lkml/1623855954-6970-1-git-send-email-yt.chang@mediatek.com/
> 
> these values are not good fit for all systems and people do feel the need to
> modify them regularly out of tree.

That is true, in Android kernel those are known 'features'. Furthermore,
in my game testing it looks like higher margins do help to shrink
number of dropped frames, while on other types of workloads (e.g.
those that you have in the link above) the 0% shows better energy.

I remember also the results from MTK regarding the PELT HALF_LIFE

https://lore.kernel.org/all/0f82011994be68502fd9833e499749866539c3df.camel@mediatek.com/

The numbers for 8ms half_life where showing really nice improvement
for the 'min fps' metric. I got similar with higher margin.

IMO we can derive quite important information from those different
experiments:
More sustainable workloads like "Yahoo browser" don't need margin.
More unpredictable workloads like "Fortnite" (shooter game with 'open
world') need some decent margin.

The problem is that the periodic task can be 'noisy'. The low-pass
filter which is our exponentially weighted moving avg PELT will
'smooth' the measured values. It will block sudden 'spikes' since
they are high-frequency changes. Those sudden 'spikes' are
the task activations where we need to compute a bit longer, e.g.
there was explosion in the game. The 25% margin helps us to
be ready for this 'noisy' task - the CPU frequency is higher
(and capacity). So if a sudden need for longer computation
is seen, then we have enough 'idle time' (~25% idle) to serve this
properly and not loose the frame.

The margin helps in two ways for 'noisy' workloads:
1. in fits_capacity() to avoid a CPU which couldn't handle it
    and prefers CPUs with higher capacity
2. it asks for longer 'idle time' e.g. 25-40% (depends on margin) to
    serve sudden computation need

IIUC, your proposal is to:
1. extend the low-pass filter to some higher frequency, so we
    could see those 'spikes' - that's the PELT HALF_LIFE boot
    parameter for 8ms
1.1. You are likely to have a 'gift' from the Util_est
      which picks the max util_avg values and maintains them
      for a while. That's why the 8ms PELT information can last longer
      and you can get higher frequency and longer idle time.
2. Plumb in this new idea of dvfs_update_delay as the new
    'margin' - this I don't understand

For the 2. I don't see that the dvfs HW characteristics are best
for this problem purpose. We can have a really fast DVFS HW,
but we need some decent spare idle time in some workloads, which
are two independent issues IMO. You might get the higher
idle time thanks to 1.1. but this is a 'side effect'.

Could you explain a bit more why this dvfs_update_delay is
crucial here?

Regards,
Lukasz