lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c7b23c4-cf6a-0942-5250-63515be4a219@gmail.com>
Date:   Tue, 15 Jun 2021 22:32:41 +0300
From:   Dmitry Osipenko <digetx@...il.com>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Thara Gopinath <thara.gopinath@...aro.org>
Cc:     Thierry Reding <thierry.reding@...il.com>,
        Jonathan Hunter <jonathanh@...dia.com>,
        Zhang Rui <rui.zhang@...el.com>,
        Amit Kucheria <amitk@...nel.org>,
        Andreas Westman Dorcsak <hedmoo@...oo.com>,
        Maxim Schwalm <maxim.schwalm@...il.com>,
        Svyatoslav Ryhel <clamor95@...il.com>,
        Ihor Didenko <tailormoon@...bler.ru>,
        Ion Agorria <ion@...rria.com>,
        Matt Merhar <mattmerhar@...tonmail.com>,
        Peter Geis <pgwipeout@...il.com>, devicetree@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-tegra@...r.kernel.org,
        linux-pm@...r.kernel.org
Subject: Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30
 thermal sensor

15.06.2021 19:18, Daniel Lezcano пишет:
> On 15/06/2021 15:01, Dmitry Osipenko wrote:
>> 15.06.2021 13:26, Viresh Kumar пишет:
>>> On 15-06-21, 12:03, Daniel Lezcano wrote:
>>>>
>>>> [Cc Viresh]
>>>>
>>>> On 29/05/2021 19:09, Dmitry Osipenko wrote:
>>>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
>>>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency
>>>>> throttling, which is activated by hardware once preprogrammed temperature
>>>>> level is breached, they also send signal to Power Management controller to
>>>>> perform emergency shutdown on a critical overheat of the SoC die. Add
>>>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
>>>>> and a cooling device.
>>>>
>>>> IMO it does not make sense to expose the hardware throttling mechanism
>>>> as a cooling device because it is not usable anywhere from the thermal
>>>> framework.
>>>>
>>>> Moreover, that will collide with the thermal / cpufreq framework
>>>> mitigation (hardware sets the frequency but the software thinks the freq
>>>> is different), right ?
>>
>> H/w mitigation is additional and should be transparent to the software
>> mitigation. The software mitigation is much more flexible, but it has
>> latency. Software also could crash and hang.
>>
>> Hardware mitigation doesn't have latency and it will continue to work
>> regardless of the software state.
> 
> Yes, I agree. Both solutions have their pros and cons. However, I don't
> think they can co-exist sanely.
> 
>> The CCF driver is aware about the h/w cooling status [1], hence software
>> sees the actual frequency.
>>
>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660
> 
> Ah interesting, thanks for the pointer.
> 
> What I'm worried about is the consistency with cpufreq.
> 
> Probably cpufreq_update_limits() should be called from the interrupt
> handler.

IIUC, the cpufreq already should be prepared for the case where firmware
may override frequency. Viresh, could you please clarify what are the
possible implications of the frequency overriding?

>>> I am not even sure what the cooling device is doing here:
>>>
>>> tegra_tsensor_set_cur_state() is not implemented and it says hardware
>>> changed it by itself. What is the benefit you are getting out of the
>>> cooling device here ?
>>
>> It allows userspace to check whether hardware cooling is active via the
>> cooling_device sysfs. Otherwise we don't have ability to check whether
>> h/w cooling is active, I think it's a useful information. It's also
>> interesting to see the cooling_device stats, showing how many times h/w
>> mitigation was active.
> 
> Actually the stats are for software mitigation. For the driver, create a
> debugfs entry like what do the other drivers or a module parameter with
> the stats.

Okay

>>>> The hardware limiter should let know the cpufreq framework about the
>>>> frequency change.
>>>>
>>>> 	https://lkml.org/lkml/2021/6/8/1792
>>>>
>>>> May be post the sensor without the hw limiter for now and address that
>>>> in a separate series ?
>>>
>>
>> I wasn't aware about existence of the thermal pressure, thank you for
>> pointing at it. At a quick glance it should be possible to benefit from
>> the information about the additional pressure.
>>
>> Seems the current thermal pressure API assumes that there is only one
>> user of the API. So it's impossible to aggregate the pressure from
>> different sources, like software cpufreq pressure + h/w freq pressure.
>> Correct? If yes, then please let me know yours thoughts about the best
>> approach of supporting the aggregation.
> 
> That is a good question. IMO, first step would be to call
> cpufreq_update_limits().

Right

> [ Cc Thara who implemented the thermal pressure ]
> 
> May be Thara has an idea about how to aggregate both? There is another
> series floating around with hardware limiter [1] and the same problematic.
> 
>  [1] https://lkml.org/lkml/2021/6/8/1791

Thanks, it indeed looks similar.

I guess the common thermal pressure update code could be moved out into
a new special cpufreq thermal QoS handler (policy->thermal_constraints),
where handler will select the frequency constraint and set up the
pressure accordingly. So there won't be any races in the code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ