linux-kernel - Re: [RFC PATCH 0/7] Introduce thermal pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181016073305.GA64994@gmail.com>
Date:   Tue, 16 Oct 2018 09:33:05 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Thara Gopinath <thara.gopinath@...aro.org>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        peterz@...radead.org, rui.zhang@...el.com,
        gregkh@...uxfoundation.org, rafael@...nel.org,
        amit.kachhap@...il.com, viresh.kumar@...aro.org,
        javi.merino@...nel.org, edubezval@...il.com,
        daniel.lezcano@...aro.org, linux-pm@...r.kernel.org,
        quentin.perret@....com, ionela.voinescu@....com,
        vincent.guittot@...aro.org
Subject: Re: [RFC PATCH 0/7] Introduce thermal pressure


* Thara Gopinath <thara.gopinath@...aro.org> wrote:

> >> Regarding testing, basic build, boot and sanity testing have been
> >> performed on hikey960 mainline kernel with debian file system.
> >> Further aobench (An occlusion renderer for benchmarking realworld
> >> floating point performance) showed the following results on hikey960
> >> with debain.
> >>
> >>                                         Result          Standard        Standard
> >>                                         (Time secs)     Error           Deviation
> >> Hikey 960 - no thermal pressure applied 138.67          6.52            11.52%
> >> Hikey 960 -  thermal pressure applied   122.37          5.78            11.57%
> > 
> > Wow, +13% speedup, impressive! We definitely want this outcome.
> > 
> > I'm wondering what happens if we do not track and decay the thermal 
> > load at all at the PELT level, but instantaneously decrease/increase 
> > effective CPU capacity in reaction to thermal events we receive from 
> > the CPU.
> 
> The problem with instantaneous update is that sometimes thermal events 
> happen at a much faster pace than cpu_capacity is updated in the 
> scheduler. This means that at the moment when scheduler uses the 
> value, it might not be correct anymore.

Let me offer a different interpretation: if we average throttling events 
then we create a 'smooth' average of 'true CPU capacity' that doesn't 
fluctuate much. This allows more stable yet asymmetric task placement if 
the thermal characteristics of the different cores is different 
(asymmetric). This, compared to instantaneous updates, would reduce 
unnecessary task migrations between cores.

Is that accurate?

If the thermal characteristics of the cores is roughly symmetric and the 
measured CPU-intense load itself is symmetric as well, then I have 
trouble seeing why reacting to thermal events should make any difference 
at all.

Are there any inherent asymmetries in the thermal properties of the 
cores, or in the benchmarked workload itself?

Thanks,

	Ingo