lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 27 Jul 2006 22:28:18 -0700
From:	"Handle X" <xhandle@...il.com>
To:	"Robert Hancock" <hancockr@...w.ca>
Cc:	"Vikas Kedia" <kedia.vikas@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: Can we ignore errors in mcelog if the server is running fine

On 7/27/06, Robert Hancock <hancockr@...w.ca> wrote:
> Vikas Kedia wrote:
> > The server seems to be running fine. A. can I ignore the following
> > mcelog errors ? B. If not what should i do to stop the server from
> > reporting mcelog errors.
>
> Looks like data cache ECC errors, meaning the CPU 0 is faulty.
> Eventually if it's not replaced there will likely be some uncorrectable
> errors and the system will likely crash.

I am facing similar, but different errors.

[root@...yxsrv ~]# mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC 89a560bb249
ADDR 1dfa49690
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 2021
       bit46 = corrected ecc error
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS 9410c00020080a13 MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC a6550f2d4de
ADDR 1de74b670
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 2021
       bit32 = err cpu0
       bit46 = corrected ecc error
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS 9410c00120080813 MCGSTATUS 0
MCE 2
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC afe4eba238a
ADDR 1d8049698
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 2021
       bit46 = corrected ecc error
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS 9410c00020080a13 MCGSTATUS 0
MCE 3
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge TSC cc945738d0a
ADDR 194c4b670
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 2021
       bit40 = error found by scrub
       bit46 = corrected ecc error
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS 9410c10020080a13 MCGSTATUS 0

Repeats whenever I do any kind of operations...
How severe is ChipKill errors? Should I consider throwing away CPU 1
and get another one.

Regards,
Om.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ