[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <EEBA739CCF11FE49B73E1FB4690F5EE649E29D57@SHSMSX101.ccr.corp.intel.com>
Date: Thu, 5 Jul 2018 06:43:06 +0000
From: "Tian, Baofeng" <baofeng.tian@...el.com>
To: Guenter Roeck <linux@...ck-us.net>,
"wim@...ux-watchdog.org" <wim@...ux-watchdog.org>,
"linux-watchdog@...r.kernel.org" <linux-watchdog@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] watchdog: add NMI handler for iTCO watchdog
Hi, Roeck
Yes, as long as NMI interrupt generated, this handler will be called before reboot.
We have another patch to handle NMI generation on X86 platform, it use perf event to generate NMI in TCO WDT driver.
Will submit later.
Thanks
Tim
-----Original Message-----
From: Guenter Roeck [mailto:groeck7@...il.com] On Behalf Of Guenter Roeck
Sent: Monday, July 2, 2018 9:38 PM
To: Tian, Baofeng <baofeng.tian@...el.com>; wim@...ux-watchdog.org; linux-watchdog@...r.kernel.org; linux-kernel@...r.kernel.org
Subject: Re: [PATCH] watchdog: add NMI handler for iTCO watchdog
On 07/02/2018 01:15 AM, Tian, Baofeng wrote:
> From: "Tian, Baofeng" <baofeng.tian@...el.com
> <mailto:baofeng.tian@...el.com>>
> Subject: [PATCH] watchdog: add NMI handler for iTCO watchdog
>
> Add a NMI handler in iTCO watchdog driver probe.
> When iTCO watchdog timeout, it gernerate a NMI interrupt.
generate
Does it always do that ? My understanding is that it has to be configured to do so.
> The NMI handler will be called when iTCO NMI interrupt triggered.
>
> In the NMI handler, it will dump all cpu backtraces and call panic to
> continue reboot.
>
> Signed-off-by: Tian, Baofeng <baofeng.tian@...el.com
> <mailto:baofeng.tian@...el.com>>
As with other patches, the e-mail address is messed up.
If I recall correctly, we already had a similar patch trying to introduce MNI handling to the driver a long time ago. I don't recall why it did not make it in, but there must be some caveat. Someone will have to look that up and explain why this submission is different and works. It would be great if you can do that; otherwise I'll have to do it. Given my time constraints, that may take a while.
> ---
> drivers/watchdog/iTCO_wdt.c | 53
> +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
> index 347f038..7a79d0f 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -68,6 +68,8 @@
> #include <linux/io.h> /* For inb/outb/... */
> #include <linux/platform_data/itco_wdt.h>
>
> +#include <linux/nmi.h>
> +#include <asm/nmi.h>
> #include "iTCO_vendor.h"
>
> /* Address definitions for the TCO */ @@ -86,6 +88,11 @@
> #define TCO2_CNT(p) (TCOBASE(p) + 0x0a) /* TCO2 Control Register
> */
> #define TCOv2_TMR(p) (TCOBASE(p) + 0x12) /* TCOv2 Timer Initial
> Value*/
>
> +#define TCO_RLD_sub(base) (base + 0x00) /* TCO Timer Reload/Curr.
> +Value */ #define TCO1_STS_sub(base) (base + 0x04) /* TCO1 Status
> +Register */ #define TCOv2_TMR_sub(base) (base + 0x12) /* TCOv2
> +Timer Initial Value*/
> +
> +
> /* internal variables */
> struct iTCO_wdt_private {
> struct watchdog_device wddev;
> @@ -112,6 +119,14 @@ struct iTCO_wdt_private {
> int (*update_no_reboot_bit)(void *p, bool set);
> };
>
> +static struct {
> + resource_size_t tco_base_address;
> + unsigned int iTCO_version;
Where is this variable used ?
> + bool pretimeout_occurred;
> + unsigned int second_to_ticks;
What is the point of this variable ? It is used as constant.
> +} iTCO_wdt_sub;
> +
> +
Checkpatch should tell you to avoid more than one empty line.
> /* module parameters */
> #define WATCHDOG_TIMEOUT 30 /* 30 sec default heartbeat */
> static int heartbeat = WATCHDOG_TIMEOUT; /* in seconds */ @@ -422,6
> +437,33 @@ static const struct watchdog_ops iTCO_wdt_ops = {
> .get_timeleft = iTCO_wdt_get_timeleft,
> };
>
> +static int iTCO_pretimeout(unsigned int cmd, struct pt_regs
> +*unused_regs) {
> + resource_size_t tco_base_address;
What is the point of this variable ?
> +
> + /* Prevent re-entrance */
> + if (iTCO_wdt_sub.pretimeout_occurred)
> + return NMI_HANDLED;
> +
This is odd and suggests that something may be wrong. There should not be more than one NMI.
> + tco_base_address = iTCO_wdt_sub.tco_base_address;
At the very least column alignment is wrong.
> +
> + /* Check the NMI is from the TCO first expiration */
> + if (inw(TCO1_STS_sub(tco_base_address)) & 0x8) {
> + iTCO_wdt_sub.pretimeout_occurred = true;
> +
> + /* Forward next expiration */
> + outw(iTCO_wdt_sub.second_to_ticks,
> +TCOv2_TMR_sub(tco_base_address));
> + outw(0x01, TCO_RLD_sub(tco_base_address));
> +
> + trigger_all_cpu_backtrace();
A wild guess is that arch_trigger_cpumask_backtrace() is not exported.
That means you can not call that function from here, and it strongly suggests that the function is not supposed to be called from drivers, including watchdog_drivers. Besides, panic() already triggers a backtrace.
> + panic_timeout = 0;
> + panic("Kernel Watchdog");
> + return NMI_HANDLED;
> + }
> +
> + return NMI_DONE;
> +}
> +
> /*
> * Init & exit routines
> */
> @@ -555,6 +597,17 @@ static int iTCO_wdt_probe(struct platform_device
> *pdev)
> return ret;
> }
>
> + /* init vars that used for nmi handler */
> + iTCO_wdt_sub.iTCO_version = p->iTCO_version;
> + iTCO_wdt_sub.second_to_ticks = seconds_to_ticks(p, 10);
Where does the 10 come from ? What is the rationale ?
> + iTCO_wdt_sub.tco_base_address = TCOBASE(p);
> +
> + ret = register_nmi_handler(NMI_LOCAL, iTCO_pretimeout, 0,
> +"iTCO_wdt");
> + if (ret != 0) {
> + pr_err("cannot register nmi handler (err=%d)\n", ret);
> + return ret;
Return an error and leave the watchdog driver registered ? That will cause some nice crashes.
> + }
> +
> pr_info("initialized. heartbeat=%d sec (nowayout=%d)\n",
> heartbeat, nowayout);
>
> --
> 2.7.4
>
Powered by blists - more mailing lists