lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <F98D4B5C3D86834DB612ABF854C98B7FB612CB@SHSMSX101.ccr.corp.intel.com>
Date:	Tue, 23 Apr 2013 00:52:37 +0000
From:	"Pan, Zhenjie" <zhenjie.pan@...el.com>
To:	Don Zickus <dzickus@...hat.com>
CC:	Stephane Eranian <eranian@...gle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"paulus@...ba.org" <paulus@...ba.org>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"acme@...stprotocols.net" <acme@...stprotocols.net>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"Liu, Chuansheng" <chuansheng.liu@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2] NMI: fix NMI period is not correct when cpu
 frequency changes issue.



> -----Original Message-----
> From: Don Zickus [mailto:dzickus@...hat.com]
> Sent: Tuesday, April 23, 2013 2:59 AM
> To: Pan, Zhenjie
> Cc: Stephane Eranian; Peter Zijlstra; paulus@...ba.org; mingo@...hat.com;
> acme@...stprotocols.net; akpm@...ux-foundation.org; tglx@...utronix.de;
> Liu, Chuansheng; linux-kernel@...r.kernel.org
> Subject: Re: [PATCH v2] NMI: fix NMI period is not correct when cpu
> frequency changes issue.
> 
> On Mon, Apr 22, 2013 at 12:50:34AM +0000, Pan, Zhenjie wrote:
> > > I believe it mattered to the Chrome folks. They want the watchdog to
> > > be as tight as possible so the user experience isn't a hang but a
> > > quick reboot instead.  They like setting the watchdog to something like 2
> seconds.
> > >
> > > There was a patch a few months ago that tried to hack around this
> > > issue and I suggested this approach as a better solution.  I forgot
> > > what the original problem was.  Perhaps someone can jump in and
> > > explain the problem being solved (other than the watchdog isn't always
> 10 seconds)?
> > >
> > > Cheers,
> > > Don
> >
> > Yes, I also think the period is important sometimes.
> > As I mentioned before, the case I meet is:
> > When the system hang with interrupt disabled, we use NMI to detect.
> > Then it will find hard lockup and cause a panic.
> > Panic is very important for debug these kind of issues.
> >
> > But if cpu frequency change, the period will be 2 times, 3 times even
> > more.(if cpu can down from 2.0GHz to 200MHz, will be 10 times, it's a very
> big deviation) This make watchdog reset happen before hard lockup detect.
> 
> So you are saying with the longer hard lockup delay, the iTCO_wdt is firing
> before the hard lockup detector?
> 
> Cheers,
> Don

Give you a detail example:
0s                                                                                               50s                        60s                      70s
|_____________________________________|___________|__________|
When 50s, a watchdog interrupt happen to inform watchdog daemon to update watchdog.
If watchdog daemon does not update watchdog in 10s, another watchdog interrupt will happen at 60s to cause a panic.
Then system will have 10s to do some dump.
At 70s, watchdog hardware reset happen.

But if interrupt is disabled at 60s, panic will be lost.
So we need NMI interrupt by performance monitor to detect hard lockup.
If the NMI period is 10s, it can guarantee that hard lockup will be detected before 70s.
But if the period is changed with cpu frequency, this will be not ensure.

Hope my explanation is clear.

BTW, I use intel_scu_watchdog(but looks have big difference with that in upstream), not iTCO_wdt.

Thanks
Pan Zhenjie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ