lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 7 Nov 2006 21:35:32 -0800
From:	Randy Dunlap <rdunlap@...otime.net>
To:	Bryce Harrington <bryce@...l.org>
Cc:	Pavel Machek <pavel@....cz>, Andrew Morton <akpm@...l.org>,
	vatsa@...ibm.com, torvalds@...l.org, linux-kernel@...r.kernel.org,
	shaohua.li@...el.com, hotplug_sig@...l.org,
	lhcs-devel@...ts.sourceforge.net
Subject: Re: Status on CPU hotplug issues

On Mon, 23 Oct 2006 15:26:24 -0700 Bryce Harrington wrote:

> On Mon, Oct 09, 2006 at 02:40:24PM -0700, Randy Dunlap wrote:
> > On Sat, 7 Oct 2006 21:57:49 +0000 Pavel Machek wrote:
> > 
> > > Hi!
> > > 
> > > > 1.  Oops offlining cpu twice on AMD64 (but not on EM64t)
> > > >     with the 2.6.18-git22 kernel
> > > > 
> > > >     Reported to hotplug lists 10/05:
> > > >       http://lists.osdl.org/pipermail/hotplug_sig/2006-October/000680.html
> > > > 
> > > >     To recreate: offline, online, and then offline a CPU, then oopses
> > > >       http://crucible.osdl.org/runs/2397/sysinfo/amd01.console
> > > >       http://crucible.osdl.org/runs/2397/sysinfo/amd01.2/proc/config
> > > > 
> > > >     Here's a snippet of the oops:
> > > > 
> > > > # echo 0 > /sys/devices/system/cpu/cpu1/online
> > > > 
> > > >  Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
> > > >  [<ffffffff80255287>] __drain_pages+0x29/0x5f
> > > > PGD 7e56d067 PUD 7ee80067 PMD 0
> > > > Oops: 0000 [1] PREEMPT SMP
> > > > CPU 0
> > > > Modules linked in:
> > > > Pid: 7203, comm: bash Tainted: G   M  2.6.18-git22 #1
> > >                                  ~~~~~
> > > kernel is unhappy here. Forced module unload?
> > 
> > Machine check exception.  'G' is Good, same place where 'P'
> > for proprietary would be.  But yes, kernel or machine is unhappy.
> 
> To followup on this issue...
> 
> I found a BIOS update for the motherboard of this machine indicating it
> includes a fix for MCE during hibernate operations; my guess is that
> cpu hotplug may be triggering this bug.
> 
> Meanwhile, we checked against a couple other different AMD64 systems;
> these are behaving correctly.
> 
> Anyway, thanks for the pointers, it sounds like this is probably just a
> hardware issue.  I'll report back if I find differently.

Regarding the MCE (that the BIOS update did not fix for this one
particular machine), I don't see the actual Machine Check Exception
kernel message log anywhere.  It would have happened before the
oops that is printed by this CPU hotplug test.  Is the complete
kerne log available or can it be reproduced?

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ