lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 08 Dec 2008 18:36:32 +0900
From:	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To:	Giangiacomo Mariotti <gg.mariotti@...il.com>
CC:	Arjan van de Ven <arjan@...radead.org>,
	Robert Hancock <hancockr@...w.ca>,
	linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [HW PROBLEM] Intel I7 MCE. Erratum or not?

Giangiacomo Mariotti wrote:
> I still don't quite understand the logic behind this exception. It
> happens always only once per boot, right after booting always at [
> 301.7320xx], which clearly means that it's always triggered by the
> same instruction/s. It's about a "Generic CACHE Level-2 Data-Write
> Error", yet after that moment it never happens again until the next
> boot at the same relative time. The cache has an hardware problem, the
> process context is corrupted, but still after that single message I
> don't have any problem, my system works normally, even under very high
> pressure on cpu and memory. Is this normal? Should I try to limit the
> number of cpu used to only 1(cpu0) on bios and disable hyperthreading?
> That way I'd have a single physical and logical cpu, so probably if it
> has an hardware problem on the cache, the heaven will fall?

IIRC, this error is not what happen on the time [301.7320xx] during
boot, but happen before the boot.  Since the record says "Processor
context corrupt," MCE handler should call panic(or do something stop
the system) if the context actually corrupted during the boot.

In other words, it seems that 1) the error was recorded at last time
when your machine crashed unexpectedly(by cosmic-ray etc.) and not cleared
yet, or 2) your machine is doing something wrong in every reset/poweroff.

Could you try "mce=nobootlog" boot option?

Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ