lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 17 Jul 2006 00:55:16 -0700
From:	"Vikas Kedia" <kedia.vikas@...il.com>
To:	"Andreas Mohr" <andi@...x01.fht-esslingen.de>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: kernel panic at load average of 24 is it acceptable ?

> Read up on MCE debugging methods on Linux or so, that should hopefully help.

Here is the output of mcelog:
root@...1:~# less /var/log/mcelog
MCE 0
CPU 0 0 data cache TSC 6988ae18046
ADDR f87f5ec0
  Data cache ECC error (syndrome ce)
       bit46 = corrected ecc error
  bus error 'local node origin, request didn't time out
      data read mem transaction
      memory access, level generic'
STATUS 9467400000000833 MCGSTATUS 0
MCE 0
CPU 0 0 data cache TSC 723b38a3633
ADDR 3d9fc0
  Data cache ECC error (syndrome ce)
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      data read mem transaction
      memory access, level generic'
STATUS d467400000000833 MCGSTATUS 0

Since it shows ECC error is the hypothesis correct that its the RAM
problem and replacing it should solve the problem.

Regards,

Vikas

On 7/17/06, Andreas Mohr <andi@...x01.fht-esslingen.de> wrote:
> Hi,
>
> On Mon, Jul 17, 2006 at 12:08:41AM -0700, Vikas Kedia wrote:
> > The memtest ran fine for 8 hours:
> > http://www.visitlab.com/styles/basic/images/memtest.JPG
> >
> > My questions are:
> > 1. Kernel panic at load average of 24 is it acceptable ?
>
> Kernel panic is _NEVER_ acceptable.
> I've seen loads in the couple hundreds with no problem.
>
> However you actually have a mce_panic() crash here.
> Make sure to figure out why this Machine Check Exception got raised,
> otherwise you might hose the box if you continue without investigation.
> It could easily be due to mal-working CPU fan etc.pp., especially since it
> happened exactly while you stress-tested the machine.
>
> > 2. If not how do I go about debugging this kernel panic ?
>
> Read up on MCE debugging methods on Linux or so, that should hopefully help.
>
> Good luck!
>
> Andreas Mohr
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ