[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbdc3b56-4465-6d3e-74db-1d5082813b9c@linaro.org>
Date: Tue, 15 Jun 2021 18:18:21 +0200
From: Daniel Lezcano <daniel.lezcano@...aro.org>
To: Dmitry Osipenko <digetx@...il.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Thara Gopinath <thara.gopinath@...aro.org>
Cc: Thierry Reding <thierry.reding@...il.com>,
Jonathan Hunter <jonathanh@...dia.com>,
Zhang Rui <rui.zhang@...el.com>,
Amit Kucheria <amitk@...nel.org>,
Andreas Westman Dorcsak <hedmoo@...oo.com>,
Maxim Schwalm <maxim.schwalm@...il.com>,
Svyatoslav Ryhel <clamor95@...il.com>,
Ihor Didenko <tailormoon@...bler.ru>,
Ion Agorria <ion@...rria.com>,
Matt Merhar <mattmerhar@...tonmail.com>,
Peter Geis <pgwipeout@...il.com>, devicetree@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-tegra@...r.kernel.org,
linux-pm@...r.kernel.org
Subject: Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30
thermal sensor
On 15/06/2021 15:01, Dmitry Osipenko wrote:
> 15.06.2021 13:26, Viresh Kumar пишет:
>> On 15-06-21, 12:03, Daniel Lezcano wrote:
>>>
>>> [Cc Viresh]
>>>
>>> On 29/05/2021 19:09, Dmitry Osipenko wrote:
>>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
>>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency
>>>> throttling, which is activated by hardware once preprogrammed temperature
>>>> level is breached, they also send signal to Power Management controller to
>>>> perform emergency shutdown on a critical overheat of the SoC die. Add
>>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
>>>> and a cooling device.
>>>
>>> IMO it does not make sense to expose the hardware throttling mechanism
>>> as a cooling device because it is not usable anywhere from the thermal
>>> framework.
>>>
>>> Moreover, that will collide with the thermal / cpufreq framework
>>> mitigation (hardware sets the frequency but the software thinks the freq
>>> is different), right ?
>
> H/w mitigation is additional and should be transparent to the software
> mitigation. The software mitigation is much more flexible, but it has
> latency. Software also could crash and hang.
>
> Hardware mitigation doesn't have latency and it will continue to work
> regardless of the software state.
Yes, I agree. Both solutions have their pros and cons. However, I don't
think they can co-exist sanely.
> The CCF driver is aware about the h/w cooling status [1], hence software
> sees the actual frequency.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660
Ah interesting, thanks for the pointer.
What I'm worried about is the consistency with cpufreq.
Probably cpufreq_update_limits() should be called from the interrupt
handler.
>> I am not even sure what the cooling device is doing here:
>>
>> tegra_tsensor_set_cur_state() is not implemented and it says hardware
>> changed it by itself. What is the benefit you are getting out of the
>> cooling device here ?
>
> It allows userspace to check whether hardware cooling is active via the
> cooling_device sysfs. Otherwise we don't have ability to check whether
> h/w cooling is active, I think it's a useful information. It's also
> interesting to see the cooling_device stats, showing how many times h/w
> mitigation was active.
Actually the stats are for software mitigation. For the driver, create a
debugfs entry like what do the other drivers or a module parameter with
the stats.
>>> The hardware limiter should let know the cpufreq framework about the
>>> frequency change.
>>>
>>> https://lkml.org/lkml/2021/6/8/1792
>>>
>>> May be post the sensor without the hw limiter for now and address that
>>> in a separate series ?
>>
>
> I wasn't aware about existence of the thermal pressure, thank you for
> pointing at it. At a quick glance it should be possible to benefit from
> the information about the additional pressure.
>
> Seems the current thermal pressure API assumes that there is only one
> user of the API. So it's impossible to aggregate the pressure from
> different sources, like software cpufreq pressure + h/w freq pressure.
> Correct? If yes, then please let me know yours thoughts about the best
> approach of supporting the aggregation.
That is a good question. IMO, first step would be to call
cpufreq_update_limits().
[ Cc Thara who implemented the thermal pressure ]
May be Thara has an idea about how to aggregate both? There is another
series floating around with hardware limiter [1] and the same problematic.
[1] https://lkml.org/lkml/2021/6/8/1791
> I'll factor out the h/w limiter from this patchset and prepare the v4.
> Thank you all for taking a look at the patches.
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Powered by blists - more mailing lists