lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Jan 2019 18:12:51 -0800
From:   Matthias Kaehlcke <mka@...omium.org>
To:     Amit Kucheria <amit.kucheria@...aro.org>
Cc:     linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org,
        bjorn.andersson@...aro.org, edubezval@...il.com,
        andy.gross@...aro.org, tdas@...eaurora.org, swboyd@...omium.org,
        dianders@...omium.org, David Brown <david.brown@...aro.org>,
        Rob Herring <robh+dt@...nel.org>,
        Mark Rutland <mark.rutland@....com>, devicetree@...r.kernel.org
Subject: Re: [PATCH v3 1/1] arm64: dts: sdm845: wireup the thermal trip
 points to cpufreq

Hi Amit,

On Mon, Jan 21, 2019 at 11:38:34PM +0530, Amit Kucheria wrote:
> Since all cpus in the big and little clusters, respectively, are in the
> same frequency domain, use all of them for mitigation in the
> cooling-map. We end up with two cooling devices - one each for the big
> and little clusters.
> 
> We throttle lightly at the first trip point, just removing the boost
> frequency. At the next trip point we allow ourselves to be throttled to
> any extent.
> 
> Signed-off-by: Amit Kucheria <amit.kucheria@...aro.org>
> ---
>  arch/arm64/boot/dts/qcom/sdm845.dtsi | 225 +++++++++++++++++++++++++--
>  1 file changed, 209 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> index c27cbd3bcb0a..878f661d16eb 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> @@ -13,6 +13,7 @@
>  #include <dt-bindings/reset/qcom,sdm845-aoss.h>
>  #include <dt-bindings/soc/qcom,rpmh-rsc.h>
>  #include <dt-bindings/clock/qcom,gcc-sdm845.h>
> +#include <dt-bindings/thermal/thermal.h>
>  
>  / {
>  	interrupt-parent = <&intc>;
> @@ -99,6 +100,7 @@
>  			compatible = "qcom,kryo385";
>  			reg = <0x0 0x0>;
>  			enable-method = "psci";
> +			#cooling-cells = <2>;
>  			next-level-cache = <&L2_0>;
>  			L2_0: l2-cache {
>  				compatible = "cache";
> @@ -114,6 +116,7 @@
>  			compatible = "qcom,kryo385";
>  			reg = <0x0 0x100>;
>  			enable-method = "psci";
> +			#cooling-cells = <2>;
>  			next-level-cache = <&L2_100>;
>  			L2_100: l2-cache {
>  				compatible = "cache";
> @@ -126,6 +129,7 @@
>  			compatible = "qcom,kryo385";
>  			reg = <0x0 0x200>;
>  			enable-method = "psci";
> +			#cooling-cells = <2>;
>  			next-level-cache = <&L2_200>;
>  			L2_200: l2-cache {
>  				compatible = "cache";
> @@ -138,6 +142,7 @@
>  			compatible = "qcom,kryo385";
>  			reg = <0x0 0x300>;
>  			enable-method = "psci";
> +			#cooling-cells = <2>;
>  			next-level-cache = <&L2_300>;
>  			L2_300: l2-cache {
>  				compatible = "cache";
> @@ -150,6 +155,7 @@
>  			compatible = "qcom,kryo385";
>  			reg = <0x0 0x400>;
>  			enable-method = "psci";
> +			#cooling-cells = <2>;
>  			next-level-cache = <&L2_400>;
>  			L2_400: l2-cache {
>  				compatible = "cache";
> @@ -162,6 +168,7 @@
>  			compatible = "qcom,kryo385";
>  			reg = <0x0 0x500>;
>  			enable-method = "psci";
> +			#cooling-cells = <2>;
>  			next-level-cache = <&L2_500>;
>  			L2_500: l2-cache {
>  				compatible = "cache";
> @@ -174,6 +181,7 @@
>  			compatible = "qcom,kryo385";
>  			reg = <0x0 0x600>;
>  			enable-method = "psci";
> +			#cooling-cells = <2>;
>  			next-level-cache = <&L2_600>;
>  			L2_600: l2-cache {
>  				compatible = "cache";
> @@ -186,6 +194,7 @@
>  			compatible = "qcom,kryo385";
>  			reg = <0x0 0x700>;
>  			enable-method = "psci";
> +			#cooling-cells = <2>;
>  			next-level-cache = <&L2_700>;
>  			L2_700: l2-cache {
>  				compatible = "cache";
> @@ -1691,18 +1700,41 @@
>  			thermal-sensors = <&tsens0 1>;
>  
>  			trips {
> -				cpu_alert0: trip0 {
> +				cpu0_alert1: trip-point@0 {
>  					temperature = <75000>;

In my observations a 'switch on/threshold' temperature of 75 degrees
leads to aggressive throttling with IPA when the temperature is above
this threshold:

[  716.760804] cpu_cooling_ratelimit: 31 callbacks suppressed
[  716.760836] cpu cpu4: Cooling state set to 10. New max freq = 1920000
[  716.773390] power_allocator_ratelimit: 15 callbacks suppressed
[  716.773405] thermal thermal_zone5: Controlling power: control_temp=95000 last_temp=73500, curr_temp=75200 total_requested_power=39025 total_granted_power=18654
[  749.609336] cpu_cooling_ratelimit: 45 callbacks suppressed
[  749.609371] cpu cpu4: Cooling state set to 11. New max freq = 1843200
[  749.624300] power_allocator_ratelimit: 24 callbacks suppressed
[  749.624323] thermal thermal_zone5: Controlling power: control_temp=95000 last_temp=70800, curr_temp=77200 total_requested_power=40136 total_granted_power=17402
[  780.152633] cpu_cooling_ratelimit: 41 callbacks suppressed
[  780.152666] cpu cpu4: Cooling state set to 11. New max freq = 1843200
[  780.165247] power_allocator_ratelimit: 21 callbacks suppressed
[  780.165261] thermal thermal_zone5: Controlling power: control_temp=95000 last_temp=64800, curr_temp=76900 total_requested_power=39719 total_granted_power=1759

(the logs come from a local patch in our tree:
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/ec1c501a8093fed44a6697a5913ef2765f518e1f)

At this point I don't have a clear idea what would be a reasonable
value for the 'switch on/threshold' temperature, but probably it
should to be higher than 75 degrees, at least with IPA. If there is
no reasonable common configuration for different thermal governors I
guess we'll have to target a commonly used governor and systems
using other 'incompatible' governors need to override the config in
their <board>.dtsi.

I should also say that the system I'm testing on isn't a
representative environment (if such a thing exists at all...). It
isn't running an upstream kernel (it's a recent version though,
4.19). We try to stay as close to upstream as possible, however our
tree includes EAS related patches that affect thermal throttling which
haven't landed upstream yet. Also we currently use a guesstimated
value for 'dynamic-power-coefficient', which impacts IPA. And our
device doesn't have it's final thermal envelope yet, possible future
hardware changes (e.g. heatsink) may alter the behavior.

>  					hysteresis = <2000>;
>  					type = "passive";
>  				};
>  
> -				cpu_crit0: trip1 {
> +				cpu0_alert0: trip-point@1 {

The labels of the two trip points (cpu0_alert0 and cpu0_alert1) are
inverted.

> +					temperature = <95000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +
> +				cpu0_crit: cpu_crit {
>  					temperature = <110000>;
>  					hysteresis = <1000>;
>  					type = "critical";
>  				};
>  			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu0_alert0>;
> +					cooling-device = <&CPU0 THERMAL_NO_LIMIT 1>,
> +							 <&CPU1 THERMAL_NO_LIMIT 1>,
> +							 <&CPU2 THERMAL_NO_LIMIT 1>,
> +							 <&CPU3 THERMAL_NO_LIMIT 1>;
> +				};

With IPA this doesn't really limit throttling to the boost
frequency. Not sure if it has a negative impact, some other platforms
with a thermal configuration that targets IPA only have a cooling map
entry for the 'desired/target' temperature.

Cheers

Matthias

Powered by blists - more mailing lists