lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <417f77ba-a0b0-4ba5-824d-8814b776c557@oss.qualcomm.com>
Date: Wed, 21 Jan 2026 12:36:44 +0100
From: Konrad Dybcio <konrad.dybcio@....qualcomm.com>
To: Akhil P Oommen <akhilpo@....qualcomm.com>,
        Jagadeesh Kona <jagadeesh.kona@....qualcomm.com>
Cc: Ajit Pandey <ajit.pandey@....qualcomm.com>,
        Imran Shaik <imran.shaik@....qualcomm.com>,
        Taniya Das <taniya.das@....qualcomm.com>,
        linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
        devicetree@...r.kernel.org, Sibi Sankar <sibi.sankar@....qualcomm.com>,
        Jassi Brar <jassisinghbrar@...il.com>, Rob Herring <robh@...nel.org>,
        Krzysztof Kozlowski <krzk+dt@...nel.org>,
        Conor Dooley
 <conor+dt@...nel.org>,
        Bjorn Andersson <andersson@...nel.org>,
        Konrad Dybcio <konradybcio@...nel.org>
Subject: Re: [PATCH 2/2] arm64: dts: qcom: SM8750: Enable CPUFreq support

On 1/20/26 9:54 PM, Akhil P Oommen wrote:
> On 1/20/2026 8:13 PM, Konrad Dybcio wrote:
>> On 1/20/26 12:25 PM, Akhil P Oommen wrote:
>>> On 1/20/2026 3:44 PM, Konrad Dybcio wrote:
>>>> On 1/19/26 8:00 PM, Akhil P Oommen wrote:
>>>>> On 12/11/2025 12:32 AM, Jagadeesh Kona wrote:
>>>>>> Add the cpucp mailbox, sram and SCMI nodes required to enable
>>>>>> the CPUFreq support using the SCMI perf protocol on SM8750 SoCs.
>>>>>>
>>>>>> Signed-off-by: Jagadeesh Kona <jagadeesh.kona@....qualcomm.com>
>>>>>
>>>>> Just curious, does this patch enable thermal mitigation for CPU clusters
>>>>> too?
>>>>
>>>> If nothing changed, we have lets-not-explode type mitigations via LMH,
>>>> but lets-not-burn-the-user would require a skin temp sensor to be
>>>> wired up, which then could be used to enable some cooling action
>>>
>>> In some chipsets, I have noticed that the gpu cooling device throttles
>>> GPU to the lowest OPP even with not-so-heavy GPU workloads, making it
>>> unusable-ly slow. My hypothesis was that it was due to unmitigated CPU
>>> temperature tripping up GPU Tsens.
>>>
>>> So, I am wondering if there are any additional CPU cooling related
>>> changes required to get a reasonable overall performance under thermal
>>> constraints.
>>
>> Yes, something like the aforementioned skin-temp sensor at least..
> 
> I suppose this sensor is off-chip and slow to react.

Yes, this would be placed somewhere on the chassis of the device to
reflect the actual temperature that the user could experience (since
there are regulations about maximum values of that)

>> Today Linux will not throttle the CPUs at all (they're not even declared
>> as cooling devices) and we sorta agreed that in general it's a good thing
>> (tm), because otherwise we'd be coding in a cooling profile into the SoC
>> DTSI without taking into account the cooling capabilities of a given end
>> device (i.e. in an extreme case, a PC with SM8650 with a cooler that's
>> 3kg of aluminium vs a Steam Frame headset where the SoC is centimeters
>> away from your face)
>>
>> Currently, we have cooling policies for devices with fans and the only
>> other action is based on a skin temperature sensor (sc8280xp + x13s).
>> Everything else is left up to the LMH defaults. AFAIK work is ongoing to
>> create a more informed solution, that would have to (quite obviously)
>> live in userland.
> 
> I can understand that the skin-temp based mitigation is influenced by
> various design decision outside of the SoC die. But I think there should
> an on-chip sensor based mitigation which is fast and usually for a short
> duration which helps to avoid extreme temperature or violating the
> maximum operating point of the chipset. I guess it should depend only on
> the SoC characteristics as it is a last resort. It may be implemented in
> SW (like cooling device for Adreno GPU) or in HW. Probably the LMH
> hardware you mentioned offers this functionality for CPU clusters. I
> have no clue. :(

Yes, the CPUs are covered.

> I am hoping that if this on-chip mitigation is enabled and wired up
> correctly for CPU clusters (probably DDR too), it would reduce the
> unnecessary thermal trips on GPU Tsens and help to reach a performance
> equilibrium which is reasonably good.

Today, the OS is unaware that it can throttle anything else than the
GPU, so in its view that's the reasonable step to take. Further, any
device it knows how to throttle, it'll do so in a very jittery fashion
where it crosses the threshold, gets slowed down, cools a bit, gets
unthrottled, heats back up, rinse and repeat (because the cooling
solution of almost any form-factor is not capable of sustaining a
100%usage workload for long)

Konrad

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ