linux-kernel - Re: [PATCH 1/1] thermal: sysfs: avoid actual readings from sysfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2e93db5-e6f8-a9d8-53de-af5ea750f0f0@linaro.org>
Date:   Wed, 7 Jun 2023 20:23:08 +0200
From:   Daniel Lezcano <daniel.lezcano@...aro.org>
To:     Eduardo Valentin <evalenti@...nel.org>
Cc:     eduval@...zon.com, rafael@...nel.org, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Amit Kucheria <amitk@...nel.org>,
        Zhang Rui <rui.zhang@...el.com>
Subject: Re: [PATCH 1/1] thermal: sysfs: avoid actual readings from sysfs

On 07/06/2023 18:38, Eduardo Valentin wrote:
> Hey Daniel,
> 
> Thanks for taking the time to read the patch.
> 
> On Wed, Jun 07, 2023 at 11:24:21AM +0200, Daniel Lezcano wrote:
>>
>>
>>
>> Hi Eduardo,
>>
>> On 07/06/2023 02:37, Eduardo Valentin wrote:
>>> From: Eduardo Valentin <eduval@...zon.com>
>>>
>>> As the thermal zone caches the current and last temperature
>>> value, the sysfs interface can use that instead of
>>> forcing an actual update or read from the device.
>>
>> If the read fails, userspace can handle that by using the previous
>> value. Do we really want to hide driver dysfunctions?
> 
> Good point.
> 
> In fact I thought of this exact problem. I sent only this patch,
> but it has more changes to come.
> 
> The next changes will replicate the current design of
> storing last_temperature in the thermal zone to also store
> the last return value, success or error, on the thermal zone
> too so that we can use here at the front end to report back
> to userspace when the reads are failing.
 >
> But yes, you are right, we do not want to keep reporting
> a successful read when the thermal zone thread has been
> failing to update the value, that needs to be reported
> up back to userspace.

IIUC, you want the temperature to be updated only by the polling thread 
and when the userspace reads the temperature, it reads a cached value, 
is that correct ?

>>> This way, if multiple userspace requests are coming
>>> in, we avoid storming the device with multiple reads
>>> and potentially clogging the timing requirement
>>> for the governors.

Sorry, I'm not convinced :/

>> Can you elaborate 'the timing requirement for the governors' ? I'm
>> missing the point
> 
> 
> The point is to avoid contention on the device update path.
> Governor that use differential equations on temperature over time
> will be very time sensitive. Step wise, power allocator, or any
> PID will be very sensitive to time. So, If userspace is hitting
> this API too often we can see cases where the updates needed to
> service userspace may defer/delay the execution of the governor
> logic.

AFAIK, reading a temperature value is usually between less than 1us and 
10us (depending on the sampling configuration in the driver).

I've done some measurements to read a temperature through sysfs and 
netlink. It is between 2us and 7us on a platforms where reading a 900ns 
latency sensor, sysfs being faster.

The time sensitive governor is the power allocator and usually, the 
sampling period is between 100ms and 250ms.

The thermal zones with fast thermal transitions may need faster sampling 
period but the hardware offloads the mitigation in this case by sampling 
every 100*us*, IIRC.

Given that, I'm not sure we are facing a design issue with thermal 
framework.

Do you have a use case with some measurements to spot an issue or is it 
a potential issue you identified ?

> Despite that, there is really no point to have more updates than
> what was configured for the thermal zone to support. Say that
> we configure a thermal zone to update itself every 500ms, yet
> userspace keeps sending reads every 100ms, we do not need necessarily
> to do a trip to the device every single time to update the temperature,
> as per the design for the thermal zone.

Sorry, I do not agree. The thermal zone is configured with a monitoring 
sampling period to detect trip point crossing and it can be configured 
without sampling period at all, just basing on trip point crossing 
interrupt.

The userspace has the right to read the current temperature it is 
interested in. For instance, the 'thermometer' in 
tools/thermal/thermometer may want to read the temperature at a high 
rate in order to profile the thermal zones with/without the mitigation 
kicking in.

Caching the values just break the current behavior.

>>> Cc: "Rafael J. Wysocki" <rafael@...nel.org> (supporter:THERMAL)
>>> Cc: Daniel Lezcano <daniel.lezcano@...aro.org> (supporter:THERMAL)
>>> Cc: Amit Kucheria <amitk@...nel.org> (reviewer:THERMAL)
>>> Cc: Zhang Rui <rui.zhang@...el.com> (reviewer:THERMAL)
>>> Cc: linux-pm@...r.kernel.org (open list:THERMAL)
>>> Cc: linux-kernel@...r.kernel.org (open list)
>>>
>>> Signed-off-by: Eduardo Valentin <eduval@...zon.com>
>>> ---
>>>    drivers/thermal/thermal_sysfs.c | 21 ++++++++++++++++-----
>>>    1 file changed, 16 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
>>> index b6daea2398da..a240c58d9e08 100644
>>> --- a/drivers/thermal/thermal_sysfs.c
>>> +++ b/drivers/thermal/thermal_sysfs.c
>>> @@ -35,12 +35,23 @@ static ssize_t
>>>    temp_show(struct device *dev, struct device_attribute *attr, char *buf)
>>>    {
>>>        struct thermal_zone_device *tz = to_thermal_zone(dev);
>>> -     int temperature, ret;
>>> -
>>> -     ret = thermal_zone_get_temp(tz, &temperature);
>>> +     int temperature;
>>>
>>> -     if (ret)
>>> -             return ret;
>>> +     /*
>>> +      * don't force new update from external reads
>>> +      * This way we avoid messing up with time constraints.
>>> +      */
>>> +     if (tz->mode == THERMAL_DEVICE_DISABLED) {
>>> +             int r;
>>> +
>>> +             r = thermal_zone_get_temp(tz, &temperature); /* holds tz->lock*/
>>> +             if (r)
>>> +                     return r;
>>> +     } else {
>>> +             mutex_lock(&tz->lock);
>>> +             temperature = tz->temperature;
>>> +             mutex_unlock(&tz->lock);
>>> +     }
>>
>> No please, we are pushing since several weeks a lot of changes to
>> encapsulate the thermal zone device structure and prevent external core
>> components to use the internals directly. Even if we can consider the
>> thermal_sysfs as part of the core code, that changes is not sysfs related.
> 
> Can you clarify your concern, is it the direct access ? The lock ?
> what is the concern?
> 
> What is your suggestion here? Do you want me to write a helper
> function that gets tz->temperature without doing a ops->get_temp()?

The concern is the thermal framework code is not really in a good shape 
and the internals leaked around in the different drivers all around the 
subsystems, that led to drivers tampering with the thermal zone device 
structure data (including taking locks).

There are ongoing efforts to do some house cleaning around the thermal 
zone device structure encapsulation as well as getting ride of the 
different ops get_trip* by replacing that with a generic trip point 
structure to deal with in the thermal core code.

Those with the final objective to have the trip points ordered and 
handle the trip point crossing correctly detected (it is currently 
somehow broken).

Here the code does directly access tz->*.

Even not being convinced by the change proposal, I would have handle 
this value cache in the thermal_zone_get_temp() function directly in 
order to preserve the code self-encapsulation.


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog