lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12bfabe40812080004p7438744eqeb884b42673bd73c@mail.gmail.com>
Date:	Mon, 8 Dec 2008 09:04:32 +0100
From:	"Giangiacomo Mariotti" <gg.mariotti@...il.com>
To:	"Hidetoshi Seto" <seto.hidetoshi@...fujitsu.com>
Cc:	"Arjan van de Ven" <arjan@...radead.org>,
	"Robert Hancock" <hancockr@...w.ca>, linux-kernel@...r.kernel.org,
	"Andi Kleen" <ak@...ux.intel.com>
Subject: Re: [HW PROBLEM] Intel I7 MCE. Erratum or not?

On Mon, Dec 8, 2008 at 8:42 AM, Hidetoshi Seto
<seto.hidetoshi@...fujitsu.com> wrote:
> Giangiacomo Mariotti wrote:
>> I noticed something else, which though may be due to my inexperience
>> with mce messages. In my directory /sys/devices/system/machinecheck
>> there are machinecheck0-7(one for each logical cpu of my system I
>> presume). Having received the MCE log always for cpu 0, I went to look
>> inside dir machinecheck0 and I found bank0-5ctl. So now my question
>> is, why do I receive MCE logs about bank 6, if my cpus don't have a
>> bank 6? Does that count start from 1? Or am I missing something else?
>
> Answer would be in the following commit:
>
>> commit 8edc5cc5ec880c96de8e6686fb0d7a5231e91c05
>> Author: Venki Pallipadi <venkatesh.pallipadi@...el.com>
>> Date:   Mon May 12 15:43:34 2008 +0200
>>
>>     x86: remove 6 bank limitation in 64 bit MCE reporting code
> (snip)
>>     The patch below does not create sysfs control (bankNctl) for banks
>>     higher than 6 as well. That needs some pre-cleanup in /sysfs mce layout,
>>     removal of per cpu /sysfs entries for bankctl as they are really global
>>     system level control today. That change will follow. This basic change
>>     is critical to report the detailed errors on banks higher than 6.
>
> So there are 6 sysfs control(bank0-5ctl) even if your cpu have more banks.
>
> Old kernel with bank limitation will say:
> "MCE: warning: using only %d banks\n"
> And it seems that old kernel will ignore records in banks higher than 6.
>
> Thanks,
> H.Seto
>
>
I see, thanks for the info.
I still don't quite understand the logic behind this exception. It
happens always only once per boot, right after booting always at [
301.7320xx], which clearly means that it's always triggered by the
same instruction/s. It's about a "Generic CACHE Level-2 Data-Write
Error", yet after that moment it never happens again until the next
boot at the same relative time. The cache has an hardware problem, the
process context is corrupted, but still after that single message I
don't have any problem, my system works normally, even under very high
pressure on cpu and memory. Is this normal? Should I try to limit the
number of cpu used to only 1(cpu0) on bios and disable hyperthreading?
That way I'd have a single physical and logical cpu, so probably if it
has an hardware problem on the cache, the heaven will fall?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ