linux-kernel - Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b0830fa9-9a3b-6862-6cab-274c17810227@ti.com>
Date:   Wed, 12 Apr 2017 22:14:36 +0530
From:   Keerthy <j-keerthy@...com>
To:     Grygorii Strashko <grygorii.strashko@...com>,
        Eduardo Valentin <edubezval@...il.com>,
        Zhang Rui <rui.zhang@...el.com>
CC:     <linux-pm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <linux-omap@...r.kernel.org>, <nm@...com>, <t-kristo@...com>
Subject: Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism



On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote:
> 
> 
> On 04/12/2017 10:44 AM, Eduardo Valentin wrote:
>> Hello,
>>
> ...
> 
>>
>> I agree. But there it nothing that says it is not reenterable. If you
>> saw something in this line, can you please share?
>>
>>>>> will you generate a patch to do this?
>>>> Sure. I will generate a patch to take care of 1) To make sure that
>>>> orderly_poweroff is called only once right away. I have already
>>>> tested.
>>>>
>>>> for 2) Cancel all the scheduled work queues to monitor the
>>>> temperature.
>>>> I will take some more time to make it and test.
>>>>
>>>> Is that okay? Or you want me to send both together?
>>>>
>>> I think you can send patch for step 1 first.
>>
>> I am happy to see that Keerthy found the problem with his setup and a
>> possible solution. But I have a few concerns here.
>>
>> 1. If regular shutdown process takes 10seconds, that is a ballpark that
>> thermal should never wait. orderly_poweroff() calls run_cmd() with wait
>> flag set. That means, if regular userland shutdown takes 10s, we are
>> waiting for it. Obviously this not acceptable. Specially if you setup
>> critical trip to be 125C. Now, if you properly size the critical trip to
>> fire before hotspot really reach 125C, for 10s (or the time it takes to
>> shutdown), then fine. But based on what was described in this thread,
>> his system is waiting 10s on regular shutdown, and his silicon is on
>> out-of-spec temperature for 10s, which is wrong.
>>
>> 2. The above scenario is not acceptable in a long run, specially from a
>> reliability perspective. If orderly_poweroff() has a possibility to
>> simply never return (or take too long), I would say the thermal
>> subsystem is using the wrong API.
>>
> 
> 
> Hh, I do not see that orderly_poweroff() will wait for anything now:
> void orderly_poweroff(bool force)
> {
> 	if (force) /* do not override the pending "true" */
> 		poweroff_force = true;
> 	schedule_work(&poweroff_work); 
> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure
> }
> 
> 
> static int __orderly_poweroff(bool force)
> {
> 	int ret;
> 
> 	ret = run_cmd(poweroff_cmd);

When i tried with multiple orderly_poweroff calls ret was always 0.
So every 250mS i see this ret = 0.

> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC
> 
> 	if (ret && force) {

So it never entered this path. ret = 0 so if is not executed.

> 		pr_warn("Failed to start orderly shutdown: forcing the issue\n");
> 
> 		/*
> 		 * I guess this should try to kick off some daemon to sync and
> 		 * poweroff asap.  Or not even bother syncing if we're doing an
> 		 * emergency shutdown?
> 		 */
> 		emergency_sync();
> 		kernel_power_off();
> ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist
> 	}
> 
> 	return ret;
> }
> 
> static bool poweroff_force;
> 
> static void poweroff_work_func(struct work_struct *work)
> {
> 	__orderly_poweroff(poweroff_force);
> }
> 
> As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result
> of US poweroff binary execution.
> 
>>
>> If you are going to implement the above two patches, keep in mind:
>> i. At least within the thermal subsystem, you need to take care of all
>> zones that could trigger a shutdown.
>> ii. serializing the calls to orderly_poweroff() seams to be more
>> concerning than cancelling all monitoring.
>>
>>
>