lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Dec 2023 14:42:34 +0100
From:   Mateusz Majewski <m.majewski2@...sung.com>
To:     Lukasz Luba <lukasz.luba@....com>
Cc:     Mateusz Majewski <m.majewski2@...sung.com>,
        "Rafael J . Wysocki" <rafael@...nel.org>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Zhang Rui <rui.zhang@...el.com>,
        Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>,
        Bartlomiej Zolnierkiewicz <bzolnier@...il.com>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        linux-pm@...r.kernel.org, linux-samsung-soc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: Various Exynos targets never return to no cooling

Hi,

> I understand your requirement for the interrupts only mode, but
> maybe till the moment there is no fix upstream, you can enable
> it as well?

We (actually Marek and independently another coworker) had an idea how
to solve this while still avoiding polling all the time, and it turned
out to be quite simple to implement (PoC-quality). The idea was to run
several cycles of polling after each interrupt. This could be done like
this:

diff --git a/drivers/thermal/samsung/exynos_tmu.c b/drivers/thermal/samsung/exynos_tmu.c
index 6482513bfe66..b4bffe405194 100644
--- a/drivers/thermal/samsung/exynos_tmu.c
+++ b/drivers/thermal/samsung/exynos_tmu.c
@@ -760,6 +760,12 @@ static irqreturn_t exynos_tmu_threaded_irq(int irq, void *id)
 {
 	struct exynos_tmu_data *data = id;
 
+	/* TODO: would need some API */
+	mutex_lock(&data->tzd->lock);
+	data->tzd->additional_poll_reps = 10;
+	data->tzd->additional_poll_jiffies = HZ / 10;
+	mutex_unlock(&data->tzd->lock);
+
 	thermal_zone_device_update(data->tzd, THERMAL_EVENT_UNSPECIFIED);
 
 	mutex_lock(&data->lock);
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 625ba07cbe2f..c825d068402f 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -299,12 +299,24 @@ static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
 
 static void monitor_thermal_zone(struct thermal_zone_device *tz)
 {
+	unsigned long delay;
+
 	if (tz->mode != THERMAL_DEVICE_ENABLED)
-		thermal_zone_device_set_polling(tz, 0);
+		delay = 0;
 	else if (tz->passive)
-		thermal_zone_device_set_polling(tz, tz->passive_delay_jiffies);
+		delay = tz->passive_delay_jiffies;
 	else if (tz->polling_delay_jiffies)
-		thermal_zone_device_set_polling(tz, tz->polling_delay_jiffies);
+		delay = tz->polling_delay_jiffies;
+	else
+		delay = 0; /* TODO: ??? */
+
+	if (tz->additional_poll_reps > 0) {
+		tz->additional_poll_reps -= 1;
+		if (delay == 0 || tz->additional_poll_jiffies < delay)
+			delay = tz->additional_poll_jiffies;
+	}
+
+	thermal_zone_device_set_polling(tz, delay);
 }
 
 static void handle_non_critical_trips(struct thermal_zone_device *tz,
@@ -425,6 +437,8 @@ static void thermal_zone_device_init(struct thermal_zone_device *tz)
 	tz->temperature = THERMAL_TEMP_INVALID;
 	tz->prev_low_trip = -INT_MAX;
 	tz->prev_high_trip = INT_MAX;
+	tz->additional_poll_jiffies = 0;
+	tz->additional_poll_reps = 0;
 	list_for_each_entry(pos, &tz->thermal_instances, tz_node)
 		pos->initialized = false;
 }
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index c7190e2dfcb4..576b1f3ef25d 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -172,6 +172,8 @@ struct thermal_zone_device {
 	int passive;
 	int prev_low_trip;
 	int prev_high_trip;
+	int additional_poll_reps;
+	unsigned long additional_poll_jiffies;
 	atomic_t need_update;
 	struct thermal_zone_device_ops *ops;
 	struct thermal_zone_params *tzp;

In my tests this is enough to resolve the issue consistently on both
TM2E and XU4, both before and after my other patchset.

To be honest, this is not the most elegant solution probably and it
still doesn't really take into account the governor needs. Therefore, if

> Regarding this topic, I just wanted to tell you that I had conversation
> with Rafael & Daniel last Fri. Rafael gave me a hint to his latest work
> in his repo regarding potentially similar race with temperature value.

brings a better solution, it would be great :)

Thank you!
Mateusz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ