linux-kernel - Re: [RFC PATCH 0/7] sched: cpufreq: Remove magic margins

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230916193846.ewjie23c4vtf4edn@airbuntu>
Date:   Sat, 16 Sep 2023 20:38:46 +0100
From:   Qais Yousef <qyousef@...alina.io>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        Lukasz Luba <lukasz.luba@....com>
Subject: Re: [RFC PATCH 0/7] sched: cpufreq: Remove magic margins

On 09/12/23 19:18, Dietmar Eggemann wrote:
> On 08/09/2023 16:07, Qais Yousef wrote:
> > On 09/08/23 09:40, Dietmar Eggemann wrote:
> >> On 08/09/2023 02:17, Qais Yousef wrote:
> >>> On 09/07/23 15:08, Peter Zijlstra wrote:
> >>>> On Mon, Aug 28, 2023 at 12:31:56AM +0100, Qais Yousef wrote:
> 
> [...]
> 
> >>> And what was a high end A78 is a mid core today. So if you look at today's
> >>> mobile world topology we really have a tiy+big+huge combination of cores. The
> >>> bigs are called mids, but they're very capable. Fits capacity forces migration
> >>> to the 'huge' cores too soon with that 80% margin. While the 80% might be too
> >>> small for the tiny ones as some workloads really struggle there if they hang on
> >>> for too long. It doesn't help that these systems ship with 4ms tick. Something
> >>> more to consider changing I guess.
> >>
> >> If this is the problem then you could simply make the margin (headroom)
> >> a function of cpu_capacity_orig?
> > 
> > I don't see what you mean. instead of capacity_of() but keep the 80%?
> > 
> > Again, I could be delusional and misunderstanding everything, but what I really
> > see fits_capacity() is about is misfit detection. But a task is not really
> > misfit until it actually has a util above the capacity of the CPU. Now due to
> > implementation details there can be a delay between the task crossing this
> > capacity and being able to move it. Which what I believe this headroom is
> > trying to achieve.
> > 
> > I think we can better define this by tying this headroom to the worst case
> > scenario it takes to actually move this misfit task to the right CPU. If it can
> > continue to run without being impacted with this delay and crossing the
> > capacity of the CPU it is on, then we should not trigger misfit IMO.
> 
> 
> Instead of:
> 
>   fits_capacity(unsigned long util, unsigned long capacity)
> 
>       return approximate_util_avg(util, TICK_USEC) < capacity;
> 
> just make 1280 in:
> 
>   #define fits_capacity(cap, max) ((cap) * 1280 < (max) * 1024)
> 
> dependent on cpu's capacity_orig or the capacity diff to the next higher
> capacity_orig.
> 
> Typical example today: {little-medium-big capacity_orig} = {128, 896, 1024}
> 
> 896÷128 = 7
> 
> 1024/896 = 1.14
> 
> to achieve higher margin on little and lower margin on medium.

I am not keen on this personally. I think these numbers are random to me and
why they help (or not help) is not clear to me at least.

I do believe that the only reason why we want to move before a task util
crosses the capacity of the CPU is tied down to the misfit load balance to be
able to move the task. Because until the task crosses the capacity, it is
getting its computational demand per our PELT representation. But since load
balance is not an immediate action (especially on our platforms where it is
4ms, something I hope we can change); we need to preemptively exclude the CPU
as a misfit when we know the task will get 'stuck' on this CPU and not get its
computational demand (as per our representation of course).

I think this removes all guess work and provides a very meaningful decision
making process that I think will scale transparently so we utilize our
resources the best we can.

We can probably optimize the code to avoid the call to approximate_util_avg()
if this is a problem.

Why do you think the ratio of cpu capacities gives more meaningful method to
judge misfit?


Thanks!

--
Qais Yousef