[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160119090623.GA29678@gmail.com>
Date:	Tue, 19 Jan 2016 10:06:23 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Grygorii Strashko <grygorii.strashko@...com>
Cc:	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	Keerthy <a0393675@...com>, Keerthy <j-keerthy@...com>,
	linux-kernel@...r.kernel.org, edubezval@...il.com, nm@...com,
	linux-pm@...r.kernel.org, linux-omap@...r.kernel.org,
	joel@....id.au, akpm@...ux-foundation.org,
	linux-arm-kernel@...ts.infradead.org, peterz@...radead.org,
	dyoung@...hat.com, josh@...htriplett.org, mpe@...erman.id.au,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH v2] reboot: Backup orderly_poweroff
* Grygorii Strashko <grygorii.strashko@...com> wrote:
> On 01/15/2016 12:14 PM, Ingo Molnar wrote:
> > 
> > * One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk> wrote:
> > 
> >>> If kernel_power_off() is called then the system should power off. No ifs and
> >>> whens.
> >>
> >> Even if it doesn't the watchdog should kill it.
> >>
> >> That is broken on some platforms on the watchdog side as the
> >> watchdog shuts down during our power off callbacks - because the system
> >> firmware is too stupid to reset the watchdog as it powers back up (so
> >> keeps rebooting).
> >>
> >> If you watchdog and firmware function properly you shouldn't even have to
> >> care if you crash during the kernel power off.
> > 
> > That's a good point as well - if the system is 'stuck' for some notion of stuck,
> > then watchdog drivers can help.
> > 
> 
> Seems ARM doesn't have endless loop implemented in machine_power_off() - so,
> not too much chances for Watchdog to fire.
> void machine_power_off(void)
> {
> 	local_irq_disable();
> 	smp_send_stop();
> 
> 	if (pm_power_off)
> 		pm_power_off();
> 
> 	--- endless loop ?
> 	--- or restart ?
> }
> [and even if it will be there - 20-30sec is usual timeout for Watchdog and this
> enough time to burn the system in case of thermal emergency poweroff :(]
> 
> > Here it's unclear whether user-space even called the sys_reboot() system call.
> > 
> 
> That's true - original log [1] has 
> Nov 30 11:19:22 [    5.942769] thermal thermal_zone3: critical temperature reached(108 C),shutting down
> [...]
> Nov 30 11:19:24 [    7.387900] ahci 4a140000.sata: flags: 64bit ncq sntf stag pm led clo only pmp pio slum part ccc apst 
> Nov 30 11:19:24 INIT: Switching to runlevel: 0
> Nov 30 11:19:24 INIT: Sending processes the TERM signal
> 
> and there are no
> [  220.004522] reboot: Power down
> 
> 
> Also, It's not the first time this part of code is discussed (thermal emergency poweroff) [2],
> so the good question, as for me, is it really required and safe to use orderly_poweroff() in
> case of thermal emergency poweroff ([3] as example)?
> 
> In general, this kind of use case can be simulated using SysRq on any arch
> - [3.290034] Freeing unused kernel memory: 492K (c0a67000 - c0ae2000)
>   INIT: version 2.88 booting
>   Starting udev
> ^^ The issue most probably might happens when system in the process of loading modules
> So, once modules loading process is started - fire Sysrq "poweroff(o)"
So I'd say emergency poweroff should be named accordingly - and the 
orderly_poweroff() name suggest anything but an emergency, right?
So I'd be fine with the following:
 - introduce a poweroff_emergency() core kernel function call
 - use it in drivers where it's justified
 - poweroff_emergency() has a configurable timeout value. If the timeout value is
   set to 0 then it powers the system off immediately.
Functionally it would be mostly equivalent to your current patch (except the '0' 
immediate poweroff functionality).
Thanks,
	Ingo
Powered by blists - more mailing lists
 
