[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6de39a910607281113hb657981jce19f15806e82104@mail.gmail.com>
Date: Fri, 28 Jul 2006 11:13:48 -0700
From: "Handle X" <xhandle@...il.com>
To: "Robert Hancock" <hancockr@...w.ca>
Cc: "Vikas Kedia" <kedia.vikas@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: Can we ignore errors in mcelog if the server is running fine
> > [root@...yxsrv ~]# mcelog
> > MCE 0
> > HARDWARE ERROR. This is *NOT* a software problem!
> > Please contact your hardware vendor
> > CPU 1 4 northbridge TSC 89a560bb249
> > ADDR 1dfa49690
> > Northbridge Chipkill ECC error
> > Chipkill ECC syndrome = 2021
> > bit46 = corrected ecc error
> > bus error 'local node response, request didn't time out
> > generic read mem transaction
> > memory access, level generic'
> > STATUS 9410c00020080a13 MCGSTATUS 0
>
> > Repeats whenever I do any kind of operations...
> > How severe is ChipKill errors? Should I consider throwing away CPU 1
> > and get another one.
>
> That sounds to me more like some of the RAM attached to CPU1 is bad..
I took out CPU1. Errors went away. But so is half of the RAM
(accessible only to CPU1)
Okay, I would test with swapping the RAM of CPU0 to CPU1 and test. If
I get messages again, I would change the RAM.
Thanks.
Om.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists