lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Feb 2019 16:05:41 +0530
From:   Amit Kucheria <amit.kucheria@...aro.org>
To:     Matthias Kaehlcke <mka@...omium.org>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Eduardo Valentin <edubezval@...il.com>,
        Andy Gross <andy.gross@...aro.org>,
        Taniya Das <tdas@...eaurora.org>,
        Stephen Boyd <swboyd@...omium.org>,
        Doug Anderson <dianders@...omium.org>,
        David Brown <david.brown@...aro.org>,
        Rob Herring <robh+dt@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        DTML <devicetree@...r.kernel.org>
Subject: Re: [PATCH v3 1/1] arm64: dts: sdm845: wireup the thermal trip points
 to cpufreq

On Sat, Jan 26, 2019 at 3:50 AM Matthias Kaehlcke <mka@...omium.org> wrote:
> > > >                   trips {
> > > > -                         cpu_alert0: trip0 {
> > > > +                         cpu0_alert1: trip-point@0 {
> > > >                                   temperature = <75000>;
> > >
> > > In my observations a 'switch on/threshold' temperature of 75 degrees
> > > leads to aggressive throttling with IPA when the temperature is above
> > > this threshold:
> > >
> > > [  716.760804] cpu_cooling_ratelimit: 31 callbacks suppressed
> > > [  716.760836] cpu cpu4: Cooling state set to 10. New max freq = 1920000
> > > [  716.773390] power_allocator_ratelimit: 15 callbacks suppressed
> > > [  716.773405] thermal thermal_zone5: Controlling power: control_temp=95000 last_temp=73500, curr_temp=75200 total_requested_power=39025 total_granted_power=18654
> > > [  749.609336] cpu_cooling_ratelimit: 45 callbacks suppressed
> > > [  749.609371] cpu cpu4: Cooling state set to 11. New max freq = 1843200
> > > [  749.624300] power_allocator_ratelimit: 24 callbacks suppressed
> > > [  749.624323] thermal thermal_zone5: Controlling power: control_temp=95000 last_temp=70800, curr_temp=77200 total_requested_power=40136 total_granted_power=17402
> > > [  780.152633] cpu_cooling_ratelimit: 41 callbacks suppressed
> > > [  780.152666] cpu cpu4: Cooling state set to 11. New max freq = 1843200
> > > [  780.165247] power_allocator_ratelimit: 21 callbacks suppressed
> > > [  780.165261] thermal thermal_zone5: Controlling power: control_temp=95000 last_temp=64800, curr_temp=76900 total_requested_power=39719 total_granted_power=1759
> > >
> > > (the logs come from a local patch in our tree:
> > > https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ec1c501a8093fed44a6697a5913ef2765f518e1f)
> > >
> > > At this point I don't have a clear idea what would be a reasonable
> > > value for the 'switch on/threshold' temperature, but probably it
> > > should to be higher than 75 degrees, at least with IPA. If there is
> > > no reasonable common configuration for different thermal governors I
> > > guess we'll have to target a commonly used governor and systems
> > > using other 'incompatible' governors need to override the config in
> > > their <board>.dtsi.
> >

Thanks for the elaborate testing and for sharing the numbers. This is
very useful information.

> > On my system I don't see a significant delta in core temperatures for
> > 'threshold' temperatures of 80, 85 or 90°C. However Dhrystone
> > performance goes up by ~8% when changing the trip point from 80 to
> > 85°C. For a switch from 85 to 90°C I see a ~2% performance delta. For
> > all trip points the average core temperatures are ~80°C (silver) and
> > ~85°C (gold). Interestingly I observed the highest average
> > temperatures with the trip point at 80°C (repeated measurements were
> > taken for different temperatures).
> >
> > Supposedly LMH throttling is disabled in the firmware I used for
> > these tests, however data suggests that it is still active
> > (temperature doesn't rise beyond 95°C, even without throttling in
> > Linux; Dhrystone performance drops when raising the temperature beyond
> > 95°C with a heat gun. I will do some more testing when I get my hands
> > on a FW that effectively disables LMH (or raises the threshold to
> > something like 105°C).
> >
> > From the data collected so far I'd suggest a 'threshold' temperature
> > of 90°C or if that seems to high 85°C. Behavior might be different
> > with other thermal governors or without LMH throttling..
>
> Some more data from measurements with different trips points, for the
> IPA and the Fair Share governors, LMH throttling was enabled:
>
>                         IPA
>         Dhrystone       Temp Silver     Temp Gold
> 75      6M              78.4            84.9
> 80      6.21M           81.4            89.8
> 85      6.74M           81.7            88.2
> 90      6.88M           79.4            84.6
>
>                         Fair Share
>         Dhrystone       Temp Silver     Temp Gold
> 75      6.63M           80.1            88.5
> 80      6.71M           80.1            88.5
> 85      6.77M           81.1            87.8
> 90      7.12M           81.2            87.8

Interesting that you get more MIPs out of fair share governor when
compared to IPA across the board. What devices were providing energy
cost information (dynamic-power-coefficient) to the IPA engine? Just
CPU and GPU? Can you point me to those patches in gerrit?

> Within this range the 'threshold' temperature doesn't seem to have a
> large impact on the average CPU temperature. There is a bit of
> fluctuation between individual measurements, I wouldn't be surprised
> if the outliers of Temp Gold for 75 and 90°C converged more with the
> other values with some more measurements.
>
> I learned how to effectively disable LMH throttling, however with that
> it was fairly easy to have the CPUs overheat, even with throttling in
> Linux. If it is feasible at all to run with LMH disabled some more
> actions will be needed (e.g. attaching a heatsink or interrupt support
> for thermal sensors instead of polling, ...).

Given that LMH kicks in at 95 and IPA manages to maintain temperatures
in the ballpark of 80-90 regardless of the trip point value, I agree
that we should move the 1st trip point to 90. This will give maximum
performance. So in "threshold" and "target" terms 90 becomes the
threshold. And since LMH kicks in at 95, I've left it as the target
trip.

These should be sane defaults for upstream and any device can override
those numbers in their board file.

Thanks again for your thorough review.

Regards,
Amit

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ