linux-kernel - Re: [non-pretimeout,4/7] Watchdog: introduce ARM SBSA watchdog driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150623164314.GB20017@roeck-us.net>
Date:	Tue, 23 Jun 2015 09:43:14 -0700
From:	Guenter Roeck <linux@...ck-us.net>
To:	Fu Wei <fu.wei@...aro.org>
Cc:	Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>,
	Linaro ACPI Mailman List <linaro-acpi@...ts.linaro.org>,
	linux-watchdog@...r.kernel.org, devicetree@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
	Wei Fu <tekkamanninja@...il.com>,
	G Gregory <graeme.gregory@...aro.org>,
	Al Stone <al.stone@...aro.org>,
	Hanjun Guo <hanjun.guo@...aro.org>,
	Timur Tabi <timur@...eaurora.org>,
	Ashwin Chaugule <ashwin.chaugule@...aro.org>,
	Arnd Bergmann <arnd@...db.de>,
	Vipul Gandhi <vgandhi@...eaurora.org>,
	Wim Van Sebroeck <wim@...ana.be>,
	Jon Masters <jcm@...hat.com>, Leo Duran <leo.duran@....com>,
	Jon Corbet <corbet@....net>,
	Mark Rutland <mark.rutland@....com>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>, rjw@...ysocki.net
Subject: Re: [non-pretimeout,4/7] Watchdog: introduce ARM SBSA watchdog driver

On Wed, Jun 24, 2015 at 12:17:19AM +0800, Fu Wei wrote:
> Hi Guenter,
> 
> you always can provide help very quickly, thank you very much :-)
> 
> On 23 June 2015 at 23:21, Guenter Roeck <linux@...ck-us.net> wrote:
> > On Tue, Jun 23, 2015 at 09:26:35PM +0800, Fu Wei wrote:
> >> Hi Guenter,
> > [ ...]
> >
> >> >
> >> >> + *       When the first timeout occurs, WS0(SPI or LPI) is triggered,
> >> >> + *       the second timeout period(as long as the first timeout period) starts.
> >> >
> >> > no longer accurate if WOR is used for the second period.
> >> >
> >> >> + *       In WS0 interrupt routine, panic() will be called for collecting
> >> >> + *       crashdown info.
> >> >> + *       If system can not recover from WS0 interrupt routine, then second
> >> >> + *       timeout occurs, WS1(reset or higher level interrupt) is triggered.
> >> >> + *       The two timeout period can be set by WOR(32bit).
> >> >
> >> > The second timeout period is determined by ...
> >> >
> >> >> + *       WOR gives a maximum watch period of around 10s at the maximum
> >> >> + *       system counter frequency.
> >> >> + *       The System Counter shall run at maximum of 400MHz.
> >> >
> >> > "... at the maximum system counter frequency of 400 MHz.", and drop the
> >> > last sentence.
> >>
> >> For the second timeout period,  I have discussed with a kdump developers,
> >> (1)10s maybe not good enough for all the case of panic + kdump, so
> >> maybe we still need to use WCV in the second timeout period
> >> (2)in the second timeout period, maybe we need to programme WCV for
> >> two reason: a, trigger WS1 to reboot system ASAP; b, feed the watchdog
> >> without cleanning WS0 flag.
> >>
> >> WHY we want to feed the watchdog (keepalive) without cleanning WS0 flag??
> >> REASON:
> >> (1)if the system context is large, we may need to feed the dog until
> >> we get all the things backed up.
> >> (2)if system goes wrong,  WS0 triggered, then panic--> kdump. if we
> >> feed the dog by WRR or programming WOR, WS0 flag will be cleaned. Once
> >> system goes wrong again, then panic again.....
> >> So this system will be in a panic--kdump--panic--kdump loop, have not
> >> chance to reset.
> >>
> >> So if we are in the second timeout period, we may need to always programme WCV.
> >>
> > The crashdump kernel is supposed to reload the watchdog driver, which will ping
> > the watchdog. If it isn't able to do that in 10 seconds, something is wrong.
> 
> yes, 10s maybe not enough for all case.
> When I tested kdump on arm64, sometimes , it took 20s. So I am
> thinking : can we make the max value of pretimeout > 10s in this
> driver.
> 
It takes more than 10 seconds to load the crashdump kernel,
or it takes more than 10 seconds to complete the dump ?

> 
> >
> >> >> +
> >> >> +     status = readl_relaxed(gwdt->control_base + SBSA_GWDT_WCS);
> >> >> +     if (status & SBSA_GWDT_WCS_WS1) {
> >> >> +             dev_warn(dev, "System reset by WDT(WCV: %llx)\n",
> >> >> +                      sbsa_gwdt_get_wcv(wdd));
> >> >
> >> > WCV here only tells us how many clock cycles were executed since the
> >> > system started (or something like that). So I still don't understand
> >> > why it is valuable to print that number.
> >>
> >> this number provides the time of system reset, I thinks that may help
> >> admin to analyse the system failure.
> >>
> > It doesn't mean anything to anyone but you since it is not in a well defined
> > time scale.
> 
> maybe I should convert it to second?
> I think the original value is better?
> 

I think you should drop it.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/