[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20110125032401.GA13618@cr0.nay.redhat.com>
Date: Tue, 25 Jan 2011 11:24:01 +0800
From: Américo Wang <xiyou.wangcong@...il.com>
To: Anithra P Janakiraman <anithra@...ux.vnet.ibm.com>
Cc: linux-kernel@...r.kernel.org, dave@...ux.vnet.ibm.com,
xiyou.wangcong@...il.com, sugaken.r3@...il.com,
alan@...rguk.ukuu.org.uk, srikar@...ux.vnet.ibm.com,
suzuki@...ibm.com, vatsa@...ux.vnet.ibm.com, ananth@...ibm.com
Subject: Re: [PATCH v2] Softdog enhancement to optionally invoke panic
instead of reboot on timer expiry
On Tue, Jan 25, 2011 at 12:03:52AM +0530, Anithra P Janakiraman wrote:
>
>Hi,
>
>We currently have no way of determining the reason for failure when a
>softdog timeout occurs. We use softdog to watch for critical application
>failures, and at the minimum a snapshot of the system would help to
>determine the cause. In such a scenario the application could fail but
>there isn't a softlockup as such, hence the detect softlockup feature
>does not help.
>The patch below adds a module parameter soft_panic which when set to
>1 causes softdog to invoke panic instead of reboot when the softdog
>timer expires. By invoking panic we execute kdump if it is configured
>and the vmcore generated by kdump should provide atleast a minimal idea
>of the reason for failure.
>
>Based on an original patch by Ken Sugawara <sugaken.r3@...il.com>
>Signed-off-by: Anithra P J <anithra@...ux.vnet.ibm.com>
Cool, using a module parameter is better.
Reviewed-by: WANG Cong <xiyou.wangcong@...il.com>
Thanks.
>---
> drivers/watchdog/softdog.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
>Index: linux-2.6.38-rc1/drivers/watchdog/softdog.c
>===================================================================
>--- linux-2.6.38-rc1.orig/drivers/watchdog/softdog.c
>+++ linux-2.6.38-rc1/drivers/watchdog/softdog.c
>@@ -48,6 +48,7 @@
> #include <linux/init.h>
> #include <linux/jiffies.h>
> #include <linux/uaccess.h>
>+#include <linux/kernel.h>
>
> #define PFX "SoftDog: "
>
>@@ -75,6 +76,13 @@
> "Softdog action, set to 1 to ignore reboots, 0 to reboot "
> "(default depends on ONLY_TESTING)");
>
>+
>+static int soft_panic;
>+
>+module_param(soft_panic, int, 0);
>+MODULE_PARM_DESC(soft_panic,
>+ "Softdog action, set to 1 to panic, 0 to reboot (default 0)");
>+
> /*
> * Our timer
> */
>@@ -98,7 +106,10 @@
>
> if (soft_noboot)
> printk(KERN_CRIT PFX "Triggered - Reboot ignored.\n");
>- else {
>+ else if (soft_panic) {
>+ printk(KERN_CRIT PFX "Initiating panic.\n");
>+ panic("Software Watchdog Timer expired.");
>+ } else {
> printk(KERN_CRIT PFX "Initiating system reboot.\n");
> emergency_restart();
> printk(KERN_CRIT PFX "Reboot didn't ?????\n");
>@@ -267,7 +278,8 @@
> };
>
> static char banner[] __initdata = KERN_INFO "Software Watchdog Timer: 0.07 "
>- "initialized. soft_noboot=%d soft_margin=%d sec (nowayout= %d)\n";
>+ "initialized. soft_noboot=%d soft_margin=%d sec soft_panic=%d "
>+ "(nowayout= %d)\n";
>
> static int __init watchdog_init(void)
> {
>@@ -298,7 +310,7 @@
> return ret;
> }
>
>- printk(banner, soft_noboot, soft_margin, nowayout);
>+ printk(banner, soft_noboot, soft_margin, soft_panic, nowayout);
>
> return 0;
> }
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists