lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070828170556.GI1645@frankl.hpl.hp.com>
Date:	Tue, 28 Aug 2007 10:05:56 -0700
From:	Stephane Eranian <eranian@....hp.com>
To:	Daniel Walker <dwalker@...sta.com>
Cc:	Björn Steinbrink <B.Steinbrink@....de>,
	ak@...e.de, linux-kernel@...r.kernel.org, akpm@...ux-foundation.org
Subject: Re: nmi_watchdog=2 regression in 2.6.21

Daniel,

On Tue, Aug 28, 2007 at 07:34:44AM -0700, Daniel Walker wrote:
> On Tue, 2007-08-28 at 02:12 -0700, Stephane Eranian wrote:
> > Daniel,
> > 
> > On Mon, Aug 27, 2007 at 04:07:54PM -0700, Daniel Walker wrote:
> > > On Mon, 2007-08-27 at 15:55 -0700, Stephane Eranian wrote:
> > > 
> > > > Yet the model name looks strange. So we need to run one more test,
> > > > as the fam/model is not enough. What we need to check is whether or
> > > > not this processor implements architectural perfmon or not.
> > > > 
> > > > Could you please compile and run the attached program and send me 
> > > > the output?
> > > 
> > > The output below is all the output ..
> > > 
> > > eax=0x7280201: version=1  num_cnt=2
> > > 
> > Then you have a Core Duo processor and the commit from Bjorn should
> > fix the problem. If it does not, then there is something else wrong.
> > Unfortunately, I do not have a Core Duo machine to try and reproduce.
> 
> There must be something else wrong, cause the problem persists .. As I
> said in past emails to Bjorn, I tested his commit in git, as well as the
> latest git all with the same issue (as well as bisecting git)..
> 
> If the hardware is buggy then we need some way to determine that..
> 
Could you instrument check_nmi_watchdog() to verify that you terminate
this function? Normally there is a safety mechanism in there.

Another  possibility is that you get flooded with NMI interrupts and
do not make forward progress.

> If this machine didn't support performance counters, what would happen
> then?
> 

If you have an Local APIC and performance counters, then it will try and use it.
Otherwise, I suspect it tries the NMI_IO_APIC (nmi_watchdog=1).

-- 
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ