[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59d087c3-1a4a-782f-8566-313e8355216c@quicinc.com>
Date: Wed, 15 Jan 2025 00:46:50 +0530
From: Manaf Meethalavalappu Pallikunhi <quic_manafm@...cinc.com>
To: Dmitry Baryshkov <dmitry.baryshkov@...aro.org>
CC: Wasim Nazir <quic_wasimn@...cinc.com>,
Bjorn Andersson
<andersson@...nel.org>,
Konrad Dybcio <konradybcio@...nel.org>, Rob Herring
<robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley
<conor+dt@...nel.org>, <linux-arm-msm@...r.kernel.org>,
<devicetree@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<kernel@...cinc.com>, <psodagud@...cinc.com>
Subject: Re: [PATCH v5 6/6] arm64: dts: qcom: Enable cpu cooling devices for
QCS9075 platforms
Hi Dmitry,
Sorry, I was busy with some other priority tasks.
On 1/10/2025 5:24 AM, Dmitry Baryshkov wrote:
> On Wed, Jan 08, 2025 at 09:38:19PM +0530, Manaf Meethalavalappu Pallikunhi wrote:
>> Hi Dmitry,
>>
>>
>> On 1/8/2025 6:16 PM, Dmitry Baryshkov wrote:
>>> On Wed, Jan 08, 2025 at 05:57:06PM +0530, Manaf Meethalavalappu Pallikunhi wrote:
>>>> Hi Dmitry,
>>>>
>>>>
>>>> On 1/3/2025 11:21 AM, Dmitry Baryshkov wrote:
>>>>> On Tue, Dec 31, 2024 at 05:31:41PM +0530, Manaf Meethalavalappu Pallikunhi wrote:
>>>>>> Hi Dmitry,
>>>>>>
>>>>>> On 12/30/2024 9:10 PM, Dmitry Baryshkov wrote:
>>>>>>> On Sun, Dec 29, 2024 at 08:53:32PM +0530, Wasim Nazir wrote:
>>>>>>>> From: Manaf Meethalavalappu Pallikunhi <quic_manafm@...cinc.com>
>>>>>>>>
>>>>>>>> In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
>>>>>>>> does corrective action for each subsystem based on sensor violation
>>>>>>>> to comply safety standards. But as QCS9075 is non-safe SoC it
>>>>>>>> requires conventional thermal mitigation to control thermal for
>>>>>>>> different subsystems.
>>>>>>>>
>>>>>>>> The cpu frequency throttling for different cpu tsens is enabled in
>>>>>>>> hardware as first defense for cpu thermal control. But QCS9075 SoC
>>>>>>>> has higher ambient specification. During high ambient condition, even
>>>>>>>> lowest frequency with multi cores can slowly build heat over the time
>>>>>>>> and it can lead to thermal run-away situations. This patch restrict
>>>>>>>> cpu cores during this scenario helps further thermal control and
>>>>>>>> avoids thermal critical violation.
>>>>>>>>
>>>>>>>> Add cpu idle injection cooling bindings for cpu tsens thermal zones
>>>>>>>> as a mitigation for cpu subsystem prior to thermal shutdown.
>>>>>>>>
>>>>>>>> Add cpu frequency cooling devices that will be used by userspace
>>>>>>>> thermal governor to mitigate skin thermal management.
>>>>>>> Does anything prevent us from having this config as a part of the basic
>>>>>>> sa8775p.dtsi setup? If HW is present in the base version but it is not
>>>>>>> accessible for whatever reason, please move it the base device config
>>>>>>> and use status "disabled" or "reserved" to the respective board files.
>>>>>> Sure, I will move idle injection node for each cpu to sa8775p.dtsi and keep
>>>>>> it disabled state. #cooling cells property for CPU, still wanted to keep it
>>>>>> in board files as we don't want to enable any cooling device in base DT.
>>>>> "we don't want" is not a proper justification. So, no.
>>>> As noted in the commit, thermal cooling mitigation is only necessary for
>>>> non-safe SoCs. Adding this cooling cell property to the CPU node in the base
>>>> DT (sa8775p.dtsi), which is shared by both safe and non-safe SoCs, would
>>>> violate the requirements for safe SoCs. Therefore, we will include it only
>>>> in non-safe SoC boards.
>>> "is only necessary" is fine. It means that it is an optional part which
>>> is going to be unused / ignored / duplicate functionality on the "safe"
>>> SoCs. What kind of requirement is going to be violated in this way?
>> From the perspective of a safe SoC, any software mitigation that compromises
>> the safety subsystem’s compliance should not be allowed. Enabling the
>> cooling device also opens up the sysfs interface for userspace, which we may
>> not fully control.
> THere are a lot of interfaces exported to the userspace.
>
>> Userspace apps or partner apps might inadvertently use
>> it. Therefore, we believe it is better not to expose such an interface, as
>> it is not required for that SoC and helps to avoid opening up an interface
>> that could potentially lead to a safety failure.
> How can thermal mitigation interface lead to safety failure? Userspace
> can possibly lower trip points, but it can not override existing
> firmware-based mitigation.
Both firmware and software passive mitigations (CPU/GPU) are not
permitted for Safe SoC. As mentioned in the commit, it is the
responsibility of the safety subsystem to take corrective action for
high temperatures. One of the safety requirements (not a functional
requirement) for Safe SoC is to avoid scaling of CPUs and bus DCVS, as
this could impact safety-critical workloads. Therefore, Safe SoC will
operate at maximum frequency with the performance governor. Enabling a
cooling device for the CPU would expose the cooling device sysfs
interface, allowing userspace to request different cooling states via
the cooling device cur_state sysfs, which could potentially lower the
frequency and violate the safety requirement.
> And if there is a known problem with the interface, it should be fixed
> instead.
There is no interface issue, as it is not required and can be disabled
for Safe SoC. This approach has already been used for commercializing
the SA8775P, and we do not want to disrupt it now. Therefore, we believe
it is better not to add cpu cooling cell property (CPU cooling device)
in sa8775p base dtsi.
Best Regards,
Manaf
>
>> Best Regards,
>>
>> Manaf
>>
Powered by blists - more mailing lists