lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 30 Apr 2013 21:39:27 -0700
From:	Colin Cross <ccross@...roid.com>
To:	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:	Zoran Markovic <zoran.markovic@...aro.org>,
	lkml <linux-kernel@...r.kernel.org>,
	Linux PM list <linux-pm@...r.kernel.org>,
	Benoit Goby <benoit@...roid.com>,
	Android Kernel Team <kernel-team@...roid.com>,
	Todd Poynor <toddpoynor@...gle.com>,
	San Mehat <san@...gle.com>,
	John Stultz <john.stultz@...aro.org>,
	Pavel Machek <pavel@....cz>, "Rafael J. Wysocki" <rjw@...k.pl>,
	Len Brown <len.brown@...el.com>
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers
 which lockup during suspend.

On Tue, Apr 30, 2013 at 9:17 PM, Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
> On Tue, Apr 30, 2013 at 08:36:21PM -0700, Colin Cross wrote:
>> On Tue, Apr 30, 2013 at 4:30 PM, Greg Kroah-Hartman
>> <gregkh@...uxfoundation.org> wrote:
>> > On Tue, Apr 30, 2013 at 03:28:33PM -0700, Zoran Markovic wrote:
>> >> From: Benoit Goby <benoit@...roid.com>
>> >>
>> >> Below is a patch from android kernel that detects a driver suspend
>> >> lockup and captures dump in the kernel log. Please review and provide
>> >> comments.
>> >
>> > There's this really cool thing called a watchdog driver that does stuff
>> > like this :)
>>
>> If the watchdog driver worked in this case this patch wouldn't exist.
>
> Great, let's fix the watchdog timer then :)
>
> What's wrong with it?
>
>> >> Rather than hard-lock the kernel, dump the suspend thread stack and
>> >> BUG() when a driver takes too long to suspend.  The timeout is set to
>> >> 12 seconds to be longer than the usbhid 10 second timeout.
>> >>
>> >> Exclude from the watchdog the time spent waiting for children that
>> >> are resumed asynchronously and time every device, whether or not they
>> >> resumed synchronously.
>> >
>> > No, don't add a driver-core-only timer, use the existing watchdog timers
>> > if you are worried about the kernel locking up.
>>
>> The watchdog timers are useless here.  For one, they generally stop
>> when their driver suspend op is called, so you may not even have one
>> running when you lock up.
>
> But you can fix that, right?

Ah, you're talking about the lockup detectors, and not drivers/watchdog.

The hardlockup detector can tell you if timer interrupts are not
firing, which is unaffected by this patch since the timer wouldn't
fire any way.  The softlockup detector could eventually tell you that
tasks were not being scheduled, but not why.  Even panic on softlockup
will only get you the stack trace of the current task, which will be
the locked up task if it is spinning, but is likely to be the idle
task if the suspend task is blocked on a wait_event.  This patch will
give the stack trace of the suspend operation that is blocked, even if
it is an asynchronous suspend callback.

>> More importantly, the purpose of this patch is to tell you which
>> driver locked up and hopefully why, and the watchdog driver will
>> usually result in a silent reset.
>
> I thought it was an option as to what the watchdog does when it
> triggers.
>
>> This patch will cause a stack trace of the driver suspend op that is
>> blocking suspend progress, even if that call does not happen in the
>> suspend thread.
>
> But who can see this, the machine is now dead.

I'm not sure what might still be working in this situation on x86, but
on ARM the machine is dead anyways.  Some random subset of drivers are
suspended, so you probably have no hardware watchdog, no console, no
video.  kexec on panic, kgdb on panic, console messages saved in
pstore, or jtag are the only options I know of.  This patch is very
useful in conjunction with pstore console.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ