lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <97204F98-FA33-4EBA-80AC-2FB3A6E78B2B@goldelico.com>
Date:   Sat, 14 Sep 2019 16:38:30 +0200
From:   "H. Nikolaus Schaller" <hns@...delico.com>
To:     Adam Ford <aford173@...il.com>
Cc:     Linux-OMAP <linux-omap@...r.kernel.org>,
        Adam Ford <adam.ford@...icpd.com>, Nishanth Menon <nm@...com>,
        Benoît Cousson <bcousson@...libre.com>,
        Tony Lindgren <tony@...mide.com>,
        Rob Herring <robh+dt@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        devicetree <devicetree@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Grazvydas Ignotas <notasas@...il.com>
Subject: Re: [RFC v2 1/2] ARM: dts: omap3: Add cpu trips and cooling map for omap3 family


> Am 14.09.2019 um 15:42 schrieb Adam Ford <aford173@...il.com>:
> 
> On Sat, Sep 14, 2019 at 4:20 AM H. Nikolaus Schaller <hns@...delico.com> wrote:
>> 
>> 
>>> Am 13.09.2019 um 17:37 schrieb Adam Ford <aford173@...il.com>:
>>> 
>>> The OMAP3530, AM3517 and DM3730 all show thresholds of 90C and 105C
>>> depending on commercial or industrial temperature ratings.  This
>>> patch expands the thermal information to the limits of 90 and 105
>>> for alert and critical.
>>> 
>>> For boards who never use industrial temperatures, these can be
>>> changed on their respective device trees with something like:
>>> 
>>> &cpu_alert0 {
>>>      temperature = <85000>; /* millicelsius */
>>> };
>>> 
>>> &cpu_crit {
>>>      temperature = <90000>; /* millicelsius */
>>> };
>>> 
>>> Signed-off-by: Adam Ford <aford173@...il.com>
>>> ---
>>> V2:  Change the CPU reference to &cpu instead of &cpu0
>>> 
>>> diff --git a/arch/arm/boot/dts/omap3-cpu-thermal.dtsi b/arch/arm/boot/dts/omap3-cpu-thermal.dtsi
>>> index 235ecfd61e2d..dfbd0cb0b00b 100644
>>> --- a/arch/arm/boot/dts/omap3-cpu-thermal.dtsi
>>> +++ b/arch/arm/boot/dts/omap3-cpu-thermal.dtsi
>>> @@ -17,4 +17,25 @@ cpu_thermal: cpu_thermal {
>>> 
>>>                      /* sensor       ID */
>>>      thermal-sensors = <&bandgap     0>;
>>> +
>>> +     cpu_trips: trips {
>>> +             cpu_alert0: cpu_alert {
>>> +                     temperature = <90000>; /* millicelsius */
>>> +                     hysteresis = <2000>; /* millicelsius */
>>> +                     type = "passive";
>>> +             };
>>> +             cpu_crit: cpu_crit {
>>> +                     temperature = <105000>; /* millicelsius */
>>> +                     hysteresis = <2000>; /* millicelsius */
>>> +                     type = "critical";
>>> +             };
>>> +     };
>>> +
>>> +     cpu_cooling_maps: cooling-maps {
>>> +             map0 {
>>> +                     trip = <&cpu_alert0>;
>>> +                     cooling-device =
>>> +                             <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
>>> +             };
>>> +     };
>>> };
>>> --
>>> 2.17.1
>>> 
>> 
>> Here is my test log (GTA04A5 with DM3730CBP100).
>> "high-load" script is driving the NEON to full power
>> and would report calculation errors.
>> 
>> There is no noise visible in the bandgap sensor data
>> induced by power supply fluctuations (log shows system
>> voltage while charging).
>> 
> 
> Great data!
> 
>> root@...ux:~# ./high-load -n2
>> 100% load stress test for 1 cores running ./neon_loop2
>> Sat Sep 14 09:05:50 UTC 2019 65° 4111mV 1000MHz
>> Sat Sep 14 09:05:50 UTC 2019 67° 4005mV 1000MHz
>> Sat Sep 14 09:05:52 UTC 2019 68° 4000mV 1000MHz
>> Sat Sep 14 09:05:53 UTC 2019 68° 4000mV 1000MHz
>> Sat Sep 14 09:05:55 UTC 2019 72° 3976mV 1000MHz
>> Sat Sep 14 09:05:56 UTC 2019 72° 4023mV 1000MHz
>> Sat Sep 14 09:05:57 UTC 2019 72° 3900mV 1000MHz
>> Sat Sep 14 09:05:59 UTC 2019 73° 4029mV 1000MHz
>> Sat Sep 14 09:06:00 UTC 2019 73° 3988mV 1000MHz
>> Sat Sep 14 09:06:01 UTC 2019 73° 4005mV 1000MHz
>> Sat Sep 14 09:06:03 UTC 2019 73° 4011mV 1000MHz
>> Sat Sep 14 09:06:04 UTC 2019 73° 4117mV 1000MHz
>> Sat Sep 14 09:06:06 UTC 2019 73° 4005mV 1000MHz
>> Sat Sep 14 09:06:07 UTC 2019 75° 3994mV 1000MHz
>> Sat Sep 14 09:06:08 UTC 2019 75° 3970mV 1000MHz
>> Sat Sep 14 09:06:09 UTC 2019 75° 4046mV 1000MHz
>> Sat Sep 14 09:06:11 UTC 2019 75° 4005mV 1000MHz
>> Sat Sep 14 09:06:12 UTC 2019 75° 4023mV 1000MHz
>> Sat Sep 14 09:06:14 UTC 2019 75° 3970mV 1000MHz
>> Sat Sep 14 09:06:15 UTC 2019 75° 4011mV 1000MHz
>> Sat Sep 14 09:06:16 UTC 2019 77° 4017mV 1000MHz
>> Sat Sep 14 09:06:18 UTC 2019 77° 3994mV 1000MHz
>> Sat Sep 14 09:06:19 UTC 2019 77° 3994mV 1000MHz
>> Sat Sep 14 09:06:20 UTC 2019 77° 3988mV 1000MHz
>> Sat Sep 14 09:06:22 UTC 2019 77° 4023mV 1000MHz
>> Sat Sep 14 09:06:23 UTC 2019 77° 4023mV 1000MHz
>> Sat Sep 14 09:06:24 UTC 2019 78° 4005mV 1000MHz
>> Sat Sep 14 09:06:26 UTC 2019 78° 4105mV 1000MHz
>> Sat Sep 14 09:06:27 UTC 2019 78° 4011mV 1000MHz
>> Sat Sep 14 09:06:28 UTC 2019 78° 3994mV 1000MHz
>> Sat Sep 14 09:06:30 UTC 2019 78° 4123mV 1000MHz
>> ...
>> Sat Sep 14 09:09:57 UTC 2019 88° 4082mV 1000MHz
>> Sat Sep 14 09:09:59 UTC 2019 88° 4164mV 1000MHz
>> Sat Sep 14 09:10:00 UTC 2019 88° 4058mV 1000MHz
>> Sat Sep 14 09:10:01 UTC 2019 88° 4058mV 1000MHz
>> Sat Sep 14 09:10:03 UTC 2019 88° 4082mV 1000MHz
>> Sat Sep 14 09:10:04 UTC 2019 88° 4058mV 1000MHz
>> Sat Sep 14 09:10:06 UTC 2019 88° 4146mV 1000MHz
>> Sat Sep 14 09:10:07 UTC 2019 88° 4041mV 1000MHz
>> Sat Sep 14 09:10:08 UTC 2019 88° 4035mV 1000MHz
>> Sat Sep 14 09:10:10 UTC 2019 88° 4052mV 1000MHz
>> Sat Sep 14 09:10:11 UTC 2019 88° 4087mV 1000MHz
>> Sat Sep 14 09:10:12 UTC 2019 88° 4152mV 1000MHz
>> Sat Sep 14 09:10:14 UTC 2019 88° 4070mV 1000MHz
>> Sat Sep 14 09:10:15 UTC 2019 88° 4064mV 1000MHz
>> Sat Sep 14 09:10:17 UTC 2019 88° 4170mV 1000MHz
>> Sat Sep 14 09:10:18 UTC 2019 88° 4058mV 1000MHz
>> Sat Sep 14 09:10:19 UTC 2019 88° 4187mV 1000MHz
>> Sat Sep 14 09:10:21 UTC 2019 88° 4093mV 1000MHz
>> Sat Sep 14 09:10:22 UTC 2019 88° 4087mV 1000MHz
>> Sat Sep 14 09:10:23 UTC 2019 90° 4070mV 1000MHz
> 
> Should we be a little more conservative?  Without knowing the
> accuracy, i believe we do not want to run at 800 or 1GHz at 90C, so if
> we made this value 89 instead of 90, we would throttle a little more
> conservatively.

Well, the OMAP5 also defines exactly 100°C in the device tree.

I would assume that the badgap sensor accuracy is so that it
never reports less than the real temperature. So if we
throttle at reported 90° TJ is likely lower.

>> Sat Sep 14 09:10:25 UTC 2019 88° 4123mV 800MHz
>> Sat Sep 14 09:10:26 UTC 2019 88° 4064mV 1000MHz
>> Sat Sep 14 09:10:28 UTC 2019 90° 4058mV 1000MHz
> 
> Again here, I interpret the data sheet correctly, we're technically out of spec

I read the data sheet as if 90°C at OPP1G is still within spec.
91 would be obviously outside (if bandgap sensor is precise).

> 
>> Sat Sep 14 09:10:29 UTC 2019 88° 4076mV 1000MHz
>> Sat Sep 14 09:10:30 UTC 2019 88° 4064mV 1000MHz
>> Sat Sep 14 09:10:32 UTC 2019 88° 4117mV 1000MHz
>> Sat Sep 14 09:10:33 UTC 2019 88° 4105mV 800MHz
>> Sat Sep 14 09:10:34 UTC 2019 88° 4070mV 1000MHz
>> Sat Sep 14 09:10:36 UTC 2019 88° 4076mV 1000MHz
>> Sat Sep 14 09:10:37 UTC 2019 88° 4087mV 1000MHz
>> Sat Sep 14 09:10:39 UTC 2019 88° 4017mV 1000MHz
>> Sat Sep 14 09:10:40 UTC 2019 88° 4093mV 1000MHz
>> Sat Sep 14 09:10:41 UTC 2019 88° 4058mV 800MHz
>> Sat Sep 14 09:10:42 UTC 2019 88° 4035mV 1000MHz
>> Sat Sep 14 09:10:44 UTC 2019 90° 4058mV 1000MHz
>> Sat Sep 14 09:10:45 UTC 2019 88° 4064mV 1000MHz
>> Sat Sep 14 09:10:47 UTC 2019 88° 4064mV 1000MHz
>> Sat Sep 14 09:10:48 UTC 2019 88° 4029mV 1000MHz
>> Sat Sep 14 09:10:50 UTC 2019 90° 4046mV 1000MHz
>> ^Ckill 4680
>> root@...ux:~# cpufreq-info
>> cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
>> Report errors and bugs to cpufreq@...r.kernel.org, please.
>> analyzing CPU 0:
>>  driver: cpufreq-dt
>>  CPUs which run at the same hardware frequency: 0
>>  CPUs which need to have their frequency coordinated by software: 0
>>  maximum transition latency: 300 us.
>>  hardware limits: 300 MHz - 1000 MHz
>>  available frequency steps: 300 MHz, 600 MHz, 800 MHz, 1000 MHz
>>  available cpufreq governors: conservative, userspace, powersave, ondemand, performance
>>  current policy: frequency should be within 300 MHz and 1000 MHz.
>>                  The governor "ondemand" may decide which speed to use
>>                  within this range.
>>  current CPU frequency is 600 MHz (asserted by call to hardware).
>>  cpufreq stats: 300 MHz:22.81%, 600 MHz:2.50%, 800 MHz:2.10%, 1000 MHz:72.59%  (1563)
>> root@...ux:~#
>> 
>> So OPP is reduced if bandgap sensor reports >= 90°C
>> which almost immediately makes the temperature
>> go down.
>> 
>> No operational hickups were observed.
>> 
>> Surface temperature of the PoP chip did rise to
>> approx. 53°C during this test.
>> 
>> Tested-by: H. Nikolaus Schaller <hns@...delico.com> # on GTA04A5 with dm3730cbp100
>> 

BTW: this patch (set) is even independent of my 1GHz OPP patches.
Should also work with OPP-v1 definitions so that maintainers can
decide which one to apply first.

It is just more difficult to reach TJ of 90°C without 1GHz.

BR,
Nikolaus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ