lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Jul 2018 15:36:18 -0700
From:   David Collins <collinsd@...eaurora.org>
To:     Doug Anderson <dianders@...omium.org>
Cc:     Matthias Kaehlcke <mka@...omium.org>,
        Andy Gross <andy.gross@...aro.org>,
        David Brown <david.brown@...aro.org>,
        Rob Herring <robh+dt@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        "open list:ARM/QUALCOMM SUPPORT" <linux-soc@...r.kernel.org>,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephen Boyd <sboyd@...nel.org>
Subject: Re: [PATCH 3/3] arm64: dts: qcom: pm8998: Add thermal zone

Hello Doug,

> On Tue, Jul 10, 2018 at 10:45 AM, David Collins <collinsd@...eaurora.org> wrote:
>> On 06/29/2018 04:54 PM, Matthias Kaehlcke wrote:
>>> On Fri, Jun 29, 2018 at 02:29:55PM -0700, David Collins wrote:
>> ...
>>>> The PMIC TEMP_ALARM hardware peripheral will perform an automatic partial
>>>> PMIC shutdown upon hitting over-temperature stage 2 (125 C).  This turns
>>>> off peripherals within the PMIC that are expected to draw significant
>>>> current.  The set of peripherals included varies between PMICs.  This
>>>> partial shutdown will occur simultaneously with the triggering of an
>>>> interrupt to the APPS processor that informs the qcom-spmi-temp-alarm
>>>> driver that an over-temperature threshold has been crossed.
>>>>
>>>> The TEMP_ALARM peripheral will perform an automatic full PMIC shutdown
>>>> upon hitting over-temperature stage 3 (145 C).  Software won't receive an
>>>> interrupt in this case because all power is cut.
>>>
>>> This information is very useful, thanks David!
>>>
>>> The (partial) hardware shutdown seems like a good measure of last
>>> resort, however I suppose we prefer Linux to initiate a shutdown
>>> before losing part of the peripherals (drivers might not be happy
>>> about this and probably not revover even when the temperature goes
>>> down again) or reach a full PMIC shutdown.
>>>
>>> Please let me know if there are reasons to prefer to go the hardware
>>> limits, it's also an option for device makers to overwrite these
>>> settings if they want different behavior.
>>
>> Disabling stage 3 automatic full PMIC shutdown at 145 C is definitely a
>> bad idea.  This exists as a last resort in order to save the hardware and
>> ensure end user safety in case of excessive temperature even if software
>> is locked up.
>>
>> Disabling stage 2 automatic partial PMIC shutdown at 125 C is not
>> recommended as the PMIC is already outside of reasonable operating
>> conditions and needs to take corrective action quickly.  However, doing so
>> may be acceptable if software is taking action to shut down the system
>> immediately upon receiving the stage 2 over-temperature interrupt.
>> Just to confirm: is it expected that at stage 2 the CPU's on the SoC
> should continue running even with partial PMIC shutdown enabled?

This is not guaranteed.


> It sounded to me like partial PMIC shutdown was supposed to shut down
> high-power rails that were not essential to the task of performing an
> orderly shutdown.

Shutting down high-power peripherals is accurate; however, special care is
not taken to ensure that an orderly shutdown is possible.  At the very
least, the HW and SW state will be out of sync for the peripherals that
are shut down.


> I think Matthias was seeing that when he reached stage 2 and partial
> PMIC shutdown happened that the system was just falling on the floor.
> ...maybe we just have things configured incorrectly?

More information about the exact crash steps would be helpful to
investigate this further.  I'm not sure how much time you want to put into
it though.  Disabling stage 2 partial shutdown and then using software to
perform a controlled shutdown at 125 C is probably the best option for you
at this point.

Take care,
David

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Powered by blists - more mailing lists