lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 7 Nov 2006 21:52:56 -0800
From:	Bryce Harrington <bryce@...l.org>
To:	Randy Dunlap <rdunlap@...otime.net>
Cc:	Pavel Machek <pavel@....cz>, Andrew Morton <akpm@...l.org>,
	vatsa@...ibm.com, torvalds@...l.org, linux-kernel@...r.kernel.org,
	shaohua.li@...el.com, hotplug_sig@...l.org,
	lhcs-devel@...ts.sourceforge.net
Subject: Re: Status on CPU hotplug issues

On Tue, Nov 07, 2006 at 09:35:32PM -0800, Randy Dunlap wrote:
> On Mon, 23 Oct 2006 15:26:24 -0700 Bryce Harrington wrote:
> 
> > On Mon, Oct 09, 2006 at 02:40:24PM -0700, Randy Dunlap wrote:
> > > On Sat, 7 Oct 2006 21:57:49 +0000 Pavel Machek wrote:
> > > 
> > > > Hi!
> > > > 
> > > > > 1.  Oops offlining cpu twice on AMD64 (but not on EM64t)
> > > > >     with the 2.6.18-git22 kernel
> > > > > 
> > > > >     Reported to hotplug lists 10/05:
> > > > >       http://lists.osdl.org/pipermail/hotplug_sig/2006-October/000680.html
> > > > > 
> > > > >     To recreate: offline, online, and then offline a CPU, then oopses
> > > > >       http://crucible.osdl.org/runs/2397/sysinfo/amd01.console
> > > > >       http://crucible.osdl.org/runs/2397/sysinfo/amd01.2/proc/config
> > > > > 
> > > > >     Here's a snippet of the oops:
> > > > > 
> > > > > # echo 0 > /sys/devices/system/cpu/cpu1/online
> > > > > 
> > > > >  Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
> > > > >  [<ffffffff80255287>] __drain_pages+0x29/0x5f
> > > > > PGD 7e56d067 PUD 7ee80067 PMD 0
> > > > > Oops: 0000 [1] PREEMPT SMP
> > > > > CPU 0
> > > > > Modules linked in:
> > > > > Pid: 7203, comm: bash Tainted: G   M  2.6.18-git22 #1
> > > >                                  ~~~~~
> > > > kernel is unhappy here. Forced module unload?
> > > 
> > > Machine check exception.  'G' is Good, same place where 'P'
> > > for proprietary would be.  But yes, kernel or machine is unhappy.
> > 
> > To followup on this issue...
> > 
> > I found a BIOS update for the motherboard of this machine indicating it
> > includes a fix for MCE during hibernate operations; my guess is that
> > cpu hotplug may be triggering this bug.
> > 
> > Meanwhile, we checked against a couple other different AMD64 systems;
> > these are behaving correctly.
> > 
> > Anyway, thanks for the pointers, it sounds like this is probably just a
> > hardware issue.  I'll report back if I find differently.
> 
> Regarding the MCE (that the BIOS update did not fix for this one
> particular machine), I don't see the actual Machine Check Exception
> kernel message log anywhere.  It would have happened before the
> oops that is printed by this CPU hotplug test.  Is the complete
> kerne log available or can it be reproduced?

I never could actually get a bonified MCE error message - the only clue
it was an MCE was your read on the line you highlighted above.  All the
info we have is what we posted.

Bryce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ