lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ed4c630-204a-4f80-a37f-f2ca838eb455@linaro.org>
Date: Wed, 3 Jul 2024 12:54:03 +0200
From: Neil Armstrong <neil.armstrong@...aro.org>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>,
 Linux PM <linux-pm@...r.kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>,
 LKML <linux-kernel@...r.kernel.org>, Lukasz Luba <lukasz.luba@....com>,
 Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
 Zhang Rui <rui.zhang@...el.com>,
 linux-arm-msm <linux-arm-msm@...r.kernel.org>
Subject: Re: [PATCH v2] thermal: core: Call monitor_thermal_zone() if zone
 temperature is invalid

Hi,

On 28/06/2024 14:10, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> 
> Commit 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip()
> if zone temperature is invalid") caused __thermal_zone_device_update()
> to return early if the current thermal zone temperature was invalid.
> 
> This was done to avoid running handle_thermal_trip() and governor
> callbacks in that case which led to confusion.  However, it went too
> far because monitor_thermal_zone() still needs to be called even when
> the zone temperature is invalid to ensure that it will be updated
> eventually in case thermal polling is enabled and the driver has no
> other means to notify the core of zone temperature changes (for example,
> it does not register an interrupt handler or ACPI notifier).
> 
> Also if the .set_trips() zone callback is expected to set up monitoring
> interrupts for a thermal zone, it has to be provided with valid
> boundaries and that can only happen if the zone temperature is known.
> 
> Accordingly, to ensure that __thermal_zone_device_update() will
> run again after a failing zone temperature check, make it call
> monitor_thermal_zone() regardless of whether or not the zone
> temperature is valid and make the latter schedule a thermal zone
> temperature update if the zone temperature is invalid even if
> polling is not enabled for the thermal zone.
> 
> Fixes: 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid")
> Reported-by: Daniel Lezcano <daniel.lezcano@...aro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
>   drivers/thermal/thermal_core.c |    5 ++++-
>   drivers/thermal/thermal_core.h |    6 ++++++
>   2 files changed, 10 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -300,6 +300,8 @@ static void monitor_thermal_zone(struct
>   		thermal_zone_device_set_polling(tz, tz->passive_delay_jiffies);
>   	else if (tz->polling_delay_jiffies)
>   		thermal_zone_device_set_polling(tz, tz->polling_delay_jiffies);
> +	else if (tz->temperature == THERMAL_TEMP_INVALID)
> +		thermal_zone_device_set_polling(tz, msecs_to_jiffies(THERMAL_RECHECK_DELAY_MS));
>   }
>   
>   static struct thermal_governor *thermal_get_tz_governor(struct thermal_zone_device *tz)
> @@ -514,7 +516,7 @@ void __thermal_zone_device_update(struct
>   	update_temperature(tz);
>   
>   	if (tz->temperature == THERMAL_TEMP_INVALID)
> -		return;
> +		goto monitor;
>   
>   	tz->notify_event = event;
>   
> @@ -536,6 +538,7 @@ void __thermal_zone_device_update(struct
>   
>   	thermal_debug_update_trip_stats(tz);
>   
> +monitor:
>   	monitor_thermal_zone(tz);
>   }
>   
> Index: linux-pm/drivers/thermal/thermal_core.h
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.h
> +++ linux-pm/drivers/thermal/thermal_core.h
> @@ -133,6 +133,12 @@ struct thermal_zone_device {
>   	struct thermal_trip_desc trips[] __counted_by(num_trips);
>   };
>   
> +/*
> + * Default delay after a failing thermal zone temperature check before
> + * attempting to check it again.
> + */
> +#define THERMAL_RECHECK_DELAY_MS	100
> +
>   /* Default Thermal Governor */
>   #if defined(CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE)
>   #define DEFAULT_THERMAL_GOVERNOR       "step_wise"
> 
> 
> 
> 

This patch on next-20240702 makes Qualcomm HDK8350, HDK8450, QRD8550, HDK8560, QRD8650 & HDK8650 output in loop:

thermal thermal_zoneXX: failed to read out thermal zone (-19)

Boot logs or ARM64 defconfig:
https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/152439#L1393
https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/152440#L2200
https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/152442#L2828
https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/152441#L1862
https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/152443#L1776
https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/152444#L1723

Result of git bisect:
# bad: [82e4255305c554b0bb18b7ccf2db86041b4c8b6e] Add linux-next specific files for 20240702
# good: [22a40d14b572deb80c0648557f4bd502d7e83826] Linux 6.10-rc6
git bisect start 'FETCH_HEAD' 'v6.10-rc6'
# bad: [f6dfcf0e9567b57b93f2564966d9177f0d8dbe05] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect bad f6dfcf0e9567b57b93f2564966d9177f0d8dbe05
# good: [7f86ae0c2dc19fea7be1da29b2bf03f085463ae7] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git
git bisect good 7f86ae0c2dc19fea7be1da29b2bf03f085463ae7
# bad: [077d5bbd75dd12af2096c96846ffc78ab5dd65b1] Merge branch 'devfreq-next' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git
git bisect bad 077d5bbd75dd12af2096c96846ffc78ab5dd65b1
# good: [271bcaf753d0afe2bd0386ab1e98132ee65b61ca] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git
git bisect good 271bcaf753d0afe2bd0386ab1e98132ee65b61ca
# good: [9758a2ee5316a6f8736ab4fd39a6f6176aa057ec] Merge branch 'hwmon-next' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git
git bisect good 9758a2ee5316a6f8736ab4fd39a6f6176aa057ec
# good: [e6bd69ea345045520bd63487b85a4b5676aff76b] Merge branch 'master' of git://linuxtv.org/mchehab/media-next.git
git bisect good e6bd69ea345045520bd63487b85a4b5676aff76b
# good: [46398edfb36e2882be5e86ea563b2db9138ae499] Merge branches 'pm-cpuidle' and 'pm-powercap' into linux-next
git bisect good 46398edfb36e2882be5e86ea563b2db9138ae499
# good: [d3927cbc52eed166f74ea7e031ed6384cc3d4d5f] Merge branch 'thermal-intel' into linux-next
git bisect good d3927cbc52eed166f74ea7e031ed6384cc3d4d5f
# good: [ce84b7beeb524e7b20983838687862454ba54df7] cpufreq: sti: add missing MODULE_DEVICE_TABLE entry for stih418
git bisect good ce84b7beeb524e7b20983838687862454ba54df7
# bad: [fcf61315d38d41f4e55856b179f9e5538e299ef4] Merge branch 'thermal-fixes' into linux-next
git bisect bad fcf61315d38d41f4e55856b179f9e5538e299ef4
# good: [4262b8d782a74c7cf7b8b94ed9e4fcb94e856d1e] dt-bindings: thermal: mediatek: Fix thermal zone definition for MT8186
git bisect good 4262b8d782a74c7cf7b8b94ed9e4fcb94e856d1e
# good: [7eeb114a635a04bea2fa7d57cedbf374c714d29e] dt-bindings: thermal: convert hisilicon-thermal.txt to dt-schema
git bisect good 7eeb114a635a04bea2fa7d57cedbf374c714d29e
# good: [107ac0d49ae6a86b4986146b9a612294f7e34406] Merge branch 'thermal/linux-next' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux into linux-next
git bisect good 107ac0d49ae6a86b4986146b9a612294f7e34406
# bad: [5725f40698b9ba7f84fbfee25b9059ba044c4b86] thermal: core: Call monitor_thermal_zone() if zone temperature is invalid
git bisect bad 5725f40698b9ba7f84fbfee25b9059ba044c4b86
# first bad commit: [5725f40698b9ba7f84fbfee25b9059ba044c4b86] thermal: core: Call monitor_thermal_zone() if zone temperature is invalid

#regzbot introduced: 5725f40698b9ba7f84fbfee25b9059ba044c4b86

Thanks,
Neil

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ