lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 13 Sep 2019 16:24:18 +0200
From:   "H. Nikolaus Schaller" <hns@...delico.com>
To:     Adam Ford <aford173@...il.com>
Cc:     Linux-OMAP <linux-omap@...r.kernel.org>,
        Tony Lindgren <tony@...mide.com>,
        André Roth <neolynx@...il.com>,
        Discussions about the Letux Kernel 
        <letux-kernel@...nphoenux.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andreas Kemnade <andreas@...nade.info>,
        Nishanth Menon <nm@...com>, Adam Ford <adam.ford@...icpd.com>,
        kernel@...a-handheld.com
Subject: Re: [RFC] ARM: dts: omap36xx: Enable thermal throttling


> Am 13.09.2019 um 16:05 schrieb Adam Ford <aford173@...il.com>:
> 
> On Fri, Sep 13, 2019 at 8:32 AM H. Nikolaus Schaller <hns@...delico.com> wrote:
>> 
>> Hi Adam,
>> 
>>> Am 13.09.2019 um 13:07 schrieb Adam Ford <aford173@...il.com>:
>> 
>>>>> +     cpu_cooling_maps: cooling-maps {
>>>>> +             map0 {
>>>>> +                     trip = <&cpu_alert0>;
>>>>> +                     /* Only allow OPP50 and OPP100 */
>>>>> +                     cooling-device = <&cpu 0 1>;
>>>> 
>>>> omap4-cpu-thermal.dtsi uses THERMAL_NO_LIMIT constants but I do not
>>>> understand their meaning (and how it relates to the opp list).
>>> 
>>> I read through the documentation, but it wasn't completely clear to
>>> me. AFAICT, the numbers after &cpu represent the min and max index in
>>> the OPP table when the condition is hit.
>> 
>> Ok. It seems to use "cooling state" for those and the first is minimum
>> and the last is maximum. Using THERMAL_NO_LIMIT (-1UL) means to have
>> no limits.
>> 
>> Since here we use the &cpu node it is likely that the "cooling state"
>> is the same as the OPP index currently in use.
>> 
>> I have looked through the .dts which use cpu_crit and the picture is
>> not unique...
>> 
>> omap4           seems to only define it
>> am57xx          has two different grade dtsi files
>> dra7            overwrites critical temperature value
>> am57xx-beagle   defines a gpio to control a fan
> 
> Checkout rk3288-veyron-mickey.dts
> 
> They have almost_warm, warm, almost_hot, hot, hotter, very_hot, and
> critical for trips, and they have as many corresponding cooling maps
> which appear to limit the CPU speeds, but their index references are
> still confusing to me.

Seems to be quite sophistcated.

The arch/arm/boot/dts/exynos5422-odroidxu3-common.dtsi also has a lot
of trip points. So there may be very different needs...

But it has potentially helpful comments...

				/* 
				 * When reaching cpu0_alert3, reduce CPU
				 * by 2 steps. On Exynos5422/5800 that would
				 * be: 1600 MHz and 1100 MHz.
				 */
				map3 {
					trip = <&cpu0_alert3>;
					cooling-device = <&cpu0 0 2>;
				};
				map4 {
					trip = <&cpu0_alert3>;
					cooling-device = <&cpu4 0 2>;
				};
				/*
				 * When reaching cpu0_alert4, reduce CPU
				 * further, down to 600 MHz (12 steps for big,
				 * 7 steps for LITTLE).
				 */
				map5 {
					trip = <&cpu0_alert4>;
					cooling-device = <&cpu0 3 7>;
				};
				map6 {
					trip = <&cpu0_alert4>;
					cooling-device = <&cpu4 3 12>;
				};

That would mean the second integer is something about how
many steps to reduce.

But the first is not explained.

BTW: this also demonstrates how a single trip point can map to multiple
cooling-device actions (something we likely do not need).

> 
> For that device,
> Warm and no limit first, then 4:   coolling-device = <&cpu0 THERMAL_NO_LIMIT 4>
> ...
> very_hot uses a number then no limit: cooling-device = <&cpu0 8
> THERMAL_NO_LIMIT>
> 
> This makes me wonder if the min and max are switched or the index
> values go backwards.

It may depend on the specific cpu driver? Maybe even omap rk and exynos
have different interpretation in code?

>> 
>> Then we can use the data sheet limits of 90°C and 105°C in the trip point
>> table (which should not be tweaked for sensor inaccuracy).
> 
> I can see not compensating if it reads high, but if the temp reads
> low, shouldn't compensate so we don't over temp the processor?

I just mean that we must ensure that the TJ is <= 90° if the bandgap
ever reports 90°. So it may report 10 or 20 or even 30 degrees more than the
real temperature but never less (reaching the critical temperature too early
but not too late).

We can achieve that by adding bias or changing slope etc. in the bandgap sensor
driver.

If I find some time I am curious enough to look into the code and the data
sheets to understand why it is said to be inaccurate... Maybe there is
jitter from some LDO and it needs a median filter?

And why it seems to add a bias of ca. 10° as soon as I read it more than
for the first time. And how well temperature correlates to ambient temperature
(it definitively correlates to cpufreq-set -f).

But we should not modify the trip temperatures by 10 or 20 or 30 degrees.
IMHO they should have the values defined by the data sheet.

BR,
Nikolaus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ