lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abf93eec-890f-4c3e-68fa-58c10678dde9@ti.com>
Date:   Wed, 12 Apr 2017 13:25:50 +0530
From:   Keerthy <j-keerthy@...com>
To:     Eduardo Valentin <edubezval@...il.com>,
        Zhang Rui <rui.zhang@...el.com>
CC:     <linux-pm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <linux-omap@...r.kernel.org>, <nm@...com>, <t-kristo@...com>
Subject: Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism



On Wednesday 12 April 2017 09:35 AM, Eduardo Valentin wrote:
> Keerthy,
> 
> On Wed, Apr 12, 2017 at 09:09:36AM +0530, Keerthy wrote:
>>
>>
>> On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote:
>>> On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote:
>>>>
>>>> On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote:
>>>>>>
>>>>>> off).
> 
> <cut>
> 
>>>>> OK... This seams to me, still a corner case supposed to be fixed at
>>>>> orderly_power_off, not at thermal. But..
>>>>>
> 
> ^^^ Then again, this must be fixed not at thermal core. And re-reading
> the history of this thread, this seams to be really something broken at
> OMAP/DRA7, as mentioned in previous messages. That is probably why you
> are pushing for pm_power_off(), which seams to be the one that works for
> you, pulling the plug correctly (DRA7).

Zhang/Eduardo,

OMAP5/DRA7 is one case.

I believe i this is the root cause of this failure.

thermal_zone_device_check --> thermal_zone_device_update -->
handle_thermal_trip --> handle_critical_trips --> orderly_poweroff

The above sequence happens every 250/500 mS based on the configuration.
The orderly_poweroff function is getting called every 250/500 mS and i
see with a full fledged nfs file system it takes at least 5-10 Seconds
to shutdown and during that time we bombard with orderly_poweroff calls
multiple times due to the thermal_zone_device_check triggering periodically.

To confirm that i made sure that handle_critical_trips calls
orderly_poweroff only once and i no longer see the failure on DRA72-EVM
board.

So IMHO once we get to handle_critical_trips case where we do
orderly_poweroff we need to do the following:

1) Make sure that orderly_poweroff is called only once.
2) Cancel all the scheduled work queues to monitor the temperature as
we have already reached a point of shutting down the system.

Let me know your thoughts on this.

Best Regards,
Keerthy
> 
>>>>>>
>>>>>>
>>>>>> However, there is no clean way of detecting such failure of
>>>>>> userspace
>>>>>> powering off the system. In such scenarios, it is necessary for a
>>>>>> backup
>>>>>> workqueue to be able to force a shutdown of the system when
>>>>>> orderly
>>>>>> shutdown is not successful after a configurable time period.
>>>>>>
>>>>> Given that system running hot is a thermal issue, I guess we care
>>>>> more
>>>>> on this matter then..
>>>> Yes!
>>>>
>>> I just read this thread again https://patchwork.kernel.org/patch/802458
>>> 1/ to recall the previous discussion.
>>>
>>> https://patchwork.kernel.org/patch/8149891/
>>> https://patchwork.kernel.org/patch/8149861/
>>> should be the solution made based on Ingo' suggestion, right?
>>>
>>> And to me, this sounds like the right direction to go, thermal does not
>>> need a back up shutdown solution, it just needs a kernel function call
>>> which guarantees the system can be shutdown/reboot immediately.
>>>
>>> is there any reason that patch 1/2 is not accepted?
>>
>> Zhang,
>>
>> http://www.serverphorums.com/read.php?12,1400964
>>
>> I got a NAK from Alan and was given this direction on a thermal_poweroff
>> which is more or less what is done in this patch.
>>
> 
> 
> Actually, Alan's suggestion is more for you to define a
> thermal_poweroff() that can be defined per architecture.
> 
> Also, please, keep track of your patch versions and also do copy
> everybody who has stated their opinion on previous discussions. These
> patches must have Ingo, Alan, and RMK copied too. In this way we avoid
> loosing track of what has been suggested and we also converge faster to
> something everybody (or most of us) agree. Next version, please, fix
> that.
> 
> 
> To me, thermal core needs a function that simply powers off the system.
> No timeouts, delayed works, backups, etc. Simple and straight.
> 
> The idea of having a per architecture implementation, as per Alan's
> suggestion, makes sense to me too. Having something different from
> pm_power_off(), specific to thermal, might also give the opportunity to
> save the power off reason.
> 
> BR,
> 
> Eduardo Valentin
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ