linux-kernel - Re: [RFC PATCH 0/7] Introduce thermal pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181010082933.4ful4dzk7rkijcwu@queper01-lin>
Date:   Wed, 10 Oct 2018 09:29:35 +0100
From:   Quentin Perret <quentin.perret@....com>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     Thara Gopinath <thara.gopinath@...aro.org>,
        linux-kernel@...r.kernel.org, mingo@...hat.com,
        peterz@...radead.org, rui.zhang@...el.com,
        gregkh@...uxfoundation.org, rafael@...nel.org,
        amit.kachhap@...il.com, viresh.kumar@...aro.org,
        javi.merino@...nel.org, edubezval@...il.com,
        daniel.lezcano@...aro.org, linux-pm@...r.kernel.org,
        ionela.voinescu@....com, vincent.guittot@...aro.org
Subject: Re: [RFC PATCH 0/7] Introduce thermal pressure

Hi Thara,

On Wednesday 10 Oct 2018 at 08:17:51 (+0200), Ingo Molnar wrote:
> 
> * Thara Gopinath <thara.gopinath@...aro.org> wrote:
> 
> > Thermal governors can respond to an overheat event for a cpu by
> > capping the cpu's maximum possible frequency. This in turn
> > means that the maximum available compute capacity of the
> > cpu is restricted. But today in linux kernel, in event of maximum
> > frequency capping of a cpu, the maximum available compute
> > capacity of the cpu is not adjusted at all. In other words, scheduler
> > is unware maximum cpu capacity restrictions placed due to thermal
> > activity. This patch series attempts to address this issue.
> > The benefits identified are better task placement among available
> > cpus in event of overheating which in turn leads to better
> > performance numbers.
> > 
> > The delta between the maximum possible capacity of a cpu and
> > maximum available capacity of a cpu due to thermal event can
> > be considered as thermal pressure. Instantaneous thermal pressure
> > is hard to record and can sometime be erroneous as there can be mismatch
> > between the actual capping of capacity and scheduler recording it.
> > Thus solution is to have a weighted average per cpu value for thermal
> > pressure over time. The weight reflects the amount of time the cpu has
> > spent at a capped maximum frequency. To accumulate, average and
> > appropriately decay thermal pressure, this patch series uses pelt
> > signals and reuses the available framework that does a similar
> > bookkeeping of rt/dl task utilization.
> > 
> > Regarding testing, basic build, boot and sanity testing have been
> > performed on hikey960 mainline kernel with debian file system.
> > Further aobench (An occlusion renderer for benchmarking realworld
> > floating point performance) showed the following results on hikey960
> > with debain.
> > 
> >                                         Result          Standard        Standard
> >                                         (Time secs)     Error           Deviation
> > Hikey 960 - no thermal pressure applied 138.67          6.52            11.52%
> > Hikey 960 -  thermal pressure applied   122.37          5.78            11.57%
> 
> Wow, +13% speedup, impressive! We definitely want this outcome.
> 
> I'm wondering what happens if we do not track and decay the thermal load at all at the PELT 
> level, but instantaneously decrease/increase effective CPU capacity in reaction to thermal 
> events we receive from the CPU.

+1, it's not that obvious (to me at least) that averaging the thermal
pressure over time is necessarily what we want. Say the thermal governor
caps a CPU and 'removes' 70% of its capacity, it will take forever for
the PELT signal to ramp-up to that level before the scheduler can react.
And the other way around, if you release the cap, it'll take a while
before we actually start using the newly available capacity. I can also
imagine how reacting too fast can be counter-productive, but I guess
having numbers and/or use-cases to show that would be great :-)

Thara, have you tried to experiment with a simpler implementation as
suggested by Ingo ?

Also, assuming that we do want to average things, do we actually want to
tie the thermal ramp-up time to the PELT half life ? That provides
nice maths properties wrt the other signals, but it's not obvious to me
that this thermal 'constant' should be the same on all platforms. Or
maybe it should ?

Thanks,
Quentin

> 
> You describe the averaging as:
> 
> > Instantaneous thermal pressure is hard to record and can sometime be erroneous as there can 
> > be mismatch between the actual capping of capacity and scheduler recording it.
> 
> Not sure I follow the argument here: are there bogus thermal throttling events? If so then
> they are hopefully not frequent enough and should average out over time even if we follow
> it instantly.
> 
> I.e. what is 'can sometimes be erroneous', exactly?
> 
> Thanks,
> 
> 	Ingo