lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0g5ejtjYQ9t1O3tW+akmu_pWav9L=-Th5f6LYac7EG3Lw@mail.gmail.com>
Date: Thu, 16 May 2024 12:02:14 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Lukasz Luba <lukasz.luba@....com>, 
	"Rafael J. Wysocki" <rjw@...ysocki.net>, LKML <linux-kernel@...r.kernel.org>, 
	Linux PM <linux-pm@...r.kernel.org>, 
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>, Zhang Rui <rui.zhang@...el.com>
Subject: Re: [PATCH v1 1/6] thermal: sysfs: Trigger zone temperature updates
 on sysfs reads

Hi Daniel,

On Thu, May 16, 2024 at 11:46 AM Daniel Lezcano
<daniel.lezcano@...aro.org> wrote:
>
>
> Hi Rafael,
>
> On 16/05/2024 11:04, Rafael J. Wysocki wrote:
> > Hi Lukasz,
> >
> > On Mon, May 13, 2024 at 9:11 AM Lukasz Luba <lukasz.luba@....com> wrote:
> >>
> >> Hi Rafael,
> >>
> >> On 5/10/24 15:13, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >>>
> >>> Reading the zone temperature via sysfs causes the driver callback to
> >>> be invoked, but it does not cause the thermal zone object to be updated.
> >>>
> >>> This is problematic if the zone temperature read via sysfs differs from
> >>> the temperature value stored in the thermal zone object as it may cause
> >>> the kernel and user space to act against each other in some cases.
> >>>
> >>> For this reason, make temp_show() trigger a zone temperature update if
> >>> the temperature returned by thermal_zone_get_temp() is different from
> >>> the temperature value stored in the thermal zone object.
>
>
> The hwmon system is doing something similar and I'm not sure we want to
> mimic the same behavior.
>
> Just to summarize:
>
> 1. There is a polling delay set
>
> This polling delay gives the sampling rate the thermal zone is
> monitored. The temperature is updated at each 'delay' tick
>
> 2. There is no polling delay set
>
> The system relies on the interrupts to tell when a temperature reaches a
> threshold.

So this is a bit of a problem if the interrupts are not coming.

At least from the debugfs perspective, there are "mitigation episodes"
that last forever if the zone temperature happens to be above a trip
at the system resume time, say, and is never updated afterward.

> On the other side, if the governor is in-kernel, then we should not read
> the temperature of the thermal zones because it is the job of the kernel
> to do that.
>
> Actually we can assume the temperature information exported to the
> userspace is a courtesy of the kernel when this one is managing the
> thermal zone.

It is not the case right now, though, as sysfs temperature reads
effectively bypass the whole in-kernel thermal management.

> If there is no governor associated to the thermal zone because there is
> no cooling device associated to the defined trip points, then we can
> assume it is up to the userspace to monitor the thermal zone.

Well, in that case trips should not be taken into account, but they are now.

> Furthermore, the hwmon gives the temperature information with the
> caching and because of that it is not possible for a thermal daemon to
> correctly handle a thermal zone.
>
> That said, I would say we don't want the userspace to influence the
> thermal zone monitoring in any manner.
>
>  From my POV, we should keep the code as it is.

Well, it is problematic as is.

> The description of the change says "it may cause the kernel and user
> space to act against each other in some cases". Is it possible to give
> the cases when that can happen ?

This is mostly theoretical, but if user space knows that the
temperature has fallen below a trip, but the kernel doesn't know that,
they may decide to put a cooling device into different states.

In any case, the issue at hand is that thermal_zone_device_update()
processes passive and active trip points without attached cooling
devices which doesn't make much sense, so this needs to be addressed
in the first place and the $subject patch may not make any difference
if that happens, so regard it as withdrawn.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ