[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=XbnuNg_B_Uoj8j9hz0t7=ORdcVB16oDmVk1q2iZYGs9A@mail.gmail.com>
Date: Wed, 11 Jul 2018 15:43:34 -0700
From: Doug Anderson <dianders@...omium.org>
To: David Collins <collinsd@...eaurora.org>
Cc: Matthias Kaehlcke <mka@...omium.org>,
Andy Gross <andy.gross@...aro.org>,
David Brown <david.brown@...aro.org>,
Rob Herring <robh+dt@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
"open list:ARM/QUALCOMM SUPPORT" <linux-soc@...r.kernel.org>,
linux-arm-msm <linux-arm-msm@...r.kernel.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
LKML <linux-kernel@...r.kernel.org>,
Stephen Boyd <sboyd@...nel.org>
Subject: Re: [PATCH 3/3] arm64: dts: qcom: pm8998: Add thermal zone
Hi
On Wed, Jul 11, 2018 at 3:36 PM, David Collins <collinsd@...eaurora.org> wrote:
> Hello Doug,
>
>> On Tue, Jul 10, 2018 at 10:45 AM, David Collins <collinsd@...eaurora.org> wrote:
>>> On 06/29/2018 04:54 PM, Matthias Kaehlcke wrote:
>>>> On Fri, Jun 29, 2018 at 02:29:55PM -0700, David Collins wrote:
>>> ...
>>>>> The PMIC TEMP_ALARM hardware peripheral will perform an automatic partial
>>>>> PMIC shutdown upon hitting over-temperature stage 2 (125 C). This turns
>>>>> off peripherals within the PMIC that are expected to draw significant
>>>>> current. The set of peripherals included varies between PMICs. This
>>>>> partial shutdown will occur simultaneously with the triggering of an
>>>>> interrupt to the APPS processor that informs the qcom-spmi-temp-alarm
>>>>> driver that an over-temperature threshold has been crossed.
>>>>>
>>>>> The TEMP_ALARM peripheral will perform an automatic full PMIC shutdown
>>>>> upon hitting over-temperature stage 3 (145 C). Software won't receive an
>>>>> interrupt in this case because all power is cut.
>>>>
>>>> This information is very useful, thanks David!
>>>>
>>>> The (partial) hardware shutdown seems like a good measure of last
>>>> resort, however I suppose we prefer Linux to initiate a shutdown
>>>> before losing part of the peripherals (drivers might not be happy
>>>> about this and probably not revover even when the temperature goes
>>>> down again) or reach a full PMIC shutdown.
>>>>
>>>> Please let me know if there are reasons to prefer to go the hardware
>>>> limits, it's also an option for device makers to overwrite these
>>>> settings if they want different behavior.
>>>
>>> Disabling stage 3 automatic full PMIC shutdown at 145 C is definitely a
>>> bad idea. This exists as a last resort in order to save the hardware and
>>> ensure end user safety in case of excessive temperature even if software
>>> is locked up.
>>>
>>> Disabling stage 2 automatic partial PMIC shutdown at 125 C is not
>>> recommended as the PMIC is already outside of reasonable operating
>>> conditions and needs to take corrective action quickly. However, doing so
>>> may be acceptable if software is taking action to shut down the system
>>> immediately upon receiving the stage 2 over-temperature interrupt.
>>> Just to confirm: is it expected that at stage 2 the CPU's on the SoC
>> should continue running even with partial PMIC shutdown enabled?
>
> This is not guaranteed.
>
>
>> It sounded to me like partial PMIC shutdown was supposed to shut down
>> high-power rails that were not essential to the task of performing an
>> orderly shutdown.
>
> Shutting down high-power peripherals is accurate; however, special care is
> not taken to ensure that an orderly shutdown is possible. At the very
> least, the HW and SW state will be out of sync for the peripherals that
> are shut down.
OK, I guess I'm confused now. Why does partial PMIC shutdown even
exist then? What is the point of leaving some rails alive if software
could stop running? It seems like it would be better to just shut
everything down.
Said another way: can you describe what benefit you see for only
partially shutting down the PMIC at stage 2 compared to just fully
shutting it down at stage 2?
>> I think Matthias was seeing that when he reached stage 2 and partial
>> PMIC shutdown happened that the system was just falling on the floor.
>> ...maybe we just have things configured incorrectly?
>
> More information about the exact crash steps would be helpful to
> investigate this further. I'm not sure how much time you want to put into
> it though.
Matthias can add more, but basically he heated the system up and when
it reached the stage 2 shutdown it was no longer responsive.
> Disabling stage 2 partial shutdown and then using software to
> perform a controlled shutdown at 125 C is probably the best option for you
> at this point.
This seems OK to me given that I don't understand the original purpose
of the partial PMIC shutdown. Would you expect that all upstream PMIC
users would want stage 2 partial shutdown disabled, so we should just
do this for all users of the PMIC?
-Doug
Powered by blists - more mailing lists