linux-kernel - Re: [Patch v5 0/6] Introduce Thermal Pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb27d440-1421-b95e-19c3-4278dd37efda@arm.com>
Date:   Tue, 12 Nov 2019 11:21:17 +0000
From:   Lukasz Luba <Lukasz.Luba@....com>
To:     Thara Gopinath <thara.gopinath@...aro.org>
CC:     "mingo@...hat.com" <mingo@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        Ionela Voinescu <Ionela.Voinescu@....com>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "rui.zhang@...el.com" <rui.zhang@...el.com>,
        "edubezval@...il.com" <edubezval@...il.com>,
        "qperret@...gle.com" <qperret@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "amit.kachhap@...il.com" <amit.kachhap@...il.com>,
        "javi.merino@...nel.org" <javi.merino@...nel.org>,
        "daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>
Subject: Re: [Patch v5 0/6] Introduce Thermal Pressure

Hi Thara,

I am going to try your patch set on some different board.
To do that I need more information regarding your setup.
Please find my comments below. I need probably one hack
which do not fully understand.

On 11/5/19 6:49 PM, Thara Gopinath wrote:
> Thermal governors can respond to an overheat event of a cpu by
> capping the cpu's maximum possible frequency. This in turn
> means that the maximum available compute capacity of the
> cpu is restricted. But today in the kernel, task scheduler is
> not notified of capping of maximum frequency of a cpu.
> In other words, scheduler is unaware of maximum capacity
> restrictions placed on a cpu due to thermal activity.
> This patch series attempts to address this issue.
> The benefits identified are better task placement among available
> cpus in event of overheating which in turn leads to better
> performance numbers.
>
> The reduction in the maximum possible capacity of a cpu due to a
> thermal event can be considered as thermal pressure. Instantaneous
> thermal pressure is hard to record and can sometime be erroneous
> as there can be mismatch between the actual capping of capacity
> and scheduler recording it. Thus solution is to have a weighted
> average per cpu value for thermal pressure over time.
> The weight reflects the amount of time the cpu has spent at a
> capped maximum frequency. Since thermal pressure is recorded as
> an average, it must be decayed periodically. Exisiting algorithm
> in the kernel scheduler pelt framework is re-used to calculate
> the weighted average. This patch series also defines a sysctl
> inerface to allow for a configurable decay period.
>
> Regarding testing, basic build, boot and sanity testing have been
> performed on db845c platform with debian file system.
> Further, dhrystone and hackbench tests have been
> run with the thermal pressure algorithm. During testing, due to
> constraints of step wise governor in dealing with big little systems,
I don't understand this modification. Could you explain what was the
issue and if this modification did not break the original
thermal solution upfront? You are then comparing this modified
version and treat it as an 'origin', am I right?

> trip point 0 temperature was made assymetric between cpus in little
> cluster and big cluster; the idea being that
> big core will heat up and cpu cooling device will throttle the
> frequency of the big cores faster, there by limiting the maximum available
> capacity and the scheduler will spread out tasks to little cores as well.
>
> Test Results
>
> Hackbench: 1 group , 30000 loops, 10 runs
>                                                 Result         SD
>                                                 (Secs)     (% of mean)
>   No Thermal Pressure                            14.03       2.69%
>   Thermal Pressure PELT Algo. Decay : 32 ms      13.29       0.56%
>   Thermal Pressure PELT Algo. Decay : 64 ms      12.57       1.56%
>   Thermal Pressure PELT Algo. Decay : 128 ms     12.71       1.04%
>   Thermal Pressure PELT Algo. Decay : 256 ms     12.29       1.42%
>   Thermal Pressure PELT Algo. Decay : 512 ms     12.42       1.15%
>
> Dhrystone Run Time  : 20 threads, 3000 MLOOPS
>                                                   Result      SD
>                                                   (Secs)    (% of mean)
>   No Thermal Pressure                              9.452      4.49%
>   Thermal Pressure PELT Algo. Decay : 32 ms        8.793      5.30%
>   Thermal Pressure PELT Algo. Decay : 64 ms        8.981      5.29%
>   Thermal Pressure PELT Algo. Decay : 128 ms       8.647      6.62%
>   Thermal Pressure PELT Algo. Decay : 256 ms       8.774      6.45%
>   Thermal Pressure PELT Algo. Decay : 512 ms       8.603      5.41%
>
What I would like to see also for this performance results is
avg temperature of the chip. Is it higher than in the 'origin'?

Regards,
Lukasz Luba

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.