linux-kernel - Re: [RFC PATCH 1/1] cpuidle: teo: Add optional util-awareness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YymJz1pk5l2oKeAN@e126311.manchester.arm.com>
Date:   Tue, 20 Sep 2022 10:38:17 +0100
From:   Kajetan Puchalski <kajetan.puchalski@....com>
To:     Chen Yu <yu.chen.surf@...il.com>
Cc:     rafael@...nel.org, daniel.lezcano@...aro.org, lukasz.luba@....com,
        Dietmar.Eggemann@....com, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Chen Yu <yu.c.chen@...el.com>,
        Zhang Rui <rui.zhang@...el.com>,
        Len Brown <len.brown@...el.com>, kajetan.puchalski@....com
Subject: Re: [RFC PATCH 1/1] cpuidle: teo: Add optional util-awareness

> Not sure if we can use util_avg as schedutil, but it looks interesting.
> The last time I was trying to propose an idea to leverage util_avg to
> optimize some
> codes in the kernel, it was suggested that it would be better to make
> the stategy
> gradual rather than 0,1 state. So I was thinking if we could make it
> something like:
> 
> next_idx = cpuidle_select();
> next_idx = next_idx * (cpu_cap - util_avg) / cpu_cap;
> 
> The lower the util_avg is, the more we honor the choice of the governor,
> vice versa.

Would that be in order to still make use of intermediate idle states (ie
the ones between first and last) or to change how the util threshold
works? It seems similar to the issue Doug pointed out.

I think there's two scenarios here, the idle landscape on Arm just looks
really different from the one on x86/Intel and we should probably
account for that. In our use case "gradual" and 0-1 are the same thing,
it's just all about how you set the threshold. On x86 on the other hand
you have the threshold and the approach to state selection to worry about.

This just further makes me think that separating this out into a
separate governor is preferable as this can work really nicely on
certain systems like ours and really badly on others like Doug's. We
probably shouldn't be bundling this with generic solutions like TEO that
work well across the board.

It might also make sense to have slightly different implementations for
x86 and arm to account for the hardware differences but that'd also be
up to Rafael to express a view on.

> > This is now possible since the CPU utilization is exported from the scheduler with the
> > sched_cpu_util function and already used e.g. in the thermal governor IPA.
> >
> > This can provide drastically decreased latency and performance benefits in
> > certain types of mobile workloads that are sensitive to latency,
> > such as Geekbench 5.
> As Doug mentioned in another thread, the impact data to energy consumption would
> also be interesting.

I included energy consumption plots in the pdf I linked in the cover
letter, here's the link:

https://github.com/mrkajetanp/lisa-notebooks/blob/a2361a5b647629bfbfc676b942c8e6498fb9bd03/idle_util_aware.pdf

The unit on the plots is gmean mW measurement so they reflect average
power usage over the course of the benchmark. They also include a column
with 'shallow' which shows power consumption with only C0 and visualises
why this works on arm and how different this is compared to x86
behaviour described by Doug.

> thanks,
> Chenyu