linux-kernel - Re: Can we ignore errors in mcelog if the server is running fine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-id: <44C9CC21.9040609@shaw.ca>
Date:	Fri, 28 Jul 2006 02:34:41 -0600
From:	Robert Hancock <hancockr@...w.ca>
To:	Handle X <xhandle@...il.com>
Cc:	Vikas Kedia <kedia.vikas@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: Can we ignore errors in mcelog if the server is running fine

Handle X wrote:
> On 7/27/06, Robert Hancock <hancockr@...w.ca> wrote:
>> Vikas Kedia wrote:
>> > The server seems to be running fine. A. can I ignore the following
>> > mcelog errors ? B. If not what should i do to stop the server from
>> > reporting mcelog errors.
>>
>> Looks like data cache ECC errors, meaning the CPU 0 is faulty.
>> Eventually if it's not replaced there will likely be some uncorrectable
>> errors and the system will likely crash.
> 
> I am facing similar, but different errors.
> 
> [root@...yxsrv ~]# mcelog
> MCE 0
> HARDWARE ERROR. This is *NOT* a software problem!
> Please contact your hardware vendor
> CPU 1 4 northbridge TSC 89a560bb249
> ADDR 1dfa49690
>  Northbridge Chipkill ECC error
>  Chipkill ECC syndrome = 2021
>       bit46 = corrected ecc error
>  bus error 'local node response, request didn't time out
>      generic read mem transaction
>      memory access, level generic'
> STATUS 9410c00020080a13 MCGSTATUS 0

> Repeats whenever I do any kind of operations...
> How severe is ChipKill errors? Should I consider throwing away CPU 1
> and get another one.

That sounds to me more like some of the RAM attached to CPU1 is bad..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@...pamshaw.ca
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/