[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbe022af0607170055x7fefdf9bg63ea77768480935a@mail.gmail.com>
Date: Mon, 17 Jul 2006 00:55:16 -0700
From: "Vikas Kedia" <kedia.vikas@...il.com>
To: "Andreas Mohr" <andi@...x01.fht-esslingen.de>
Cc: linux-kernel@...r.kernel.org
Subject: Re: kernel panic at load average of 24 is it acceptable ?
> Read up on MCE debugging methods on Linux or so, that should hopefully help.
Here is the output of mcelog:
root@...1:~# less /var/log/mcelog
MCE 0
CPU 0 0 data cache TSC 6988ae18046
ADDR f87f5ec0
Data cache ECC error (syndrome ce)
bit46 = corrected ecc error
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS 9467400000000833 MCGSTATUS 0
MCE 0
CPU 0 0 data cache TSC 723b38a3633
ADDR 3d9fc0
Data cache ECC error (syndrome ce)
bit46 = corrected ecc error
bit62 = error overflow (multiple errors)
bus error 'local node origin, request didn't time out
data read mem transaction
memory access, level generic'
STATUS d467400000000833 MCGSTATUS 0
Since it shows ECC error is the hypothesis correct that its the RAM
problem and replacing it should solve the problem.
Regards,
Vikas
On 7/17/06, Andreas Mohr <andi@...x01.fht-esslingen.de> wrote:
> Hi,
>
> On Mon, Jul 17, 2006 at 12:08:41AM -0700, Vikas Kedia wrote:
> > The memtest ran fine for 8 hours:
> > http://www.visitlab.com/styles/basic/images/memtest.JPG
> >
> > My questions are:
> > 1. Kernel panic at load average of 24 is it acceptable ?
>
> Kernel panic is _NEVER_ acceptable.
> I've seen loads in the couple hundreds with no problem.
>
> However you actually have a mce_panic() crash here.
> Make sure to figure out why this Machine Check Exception got raised,
> otherwise you might hose the box if you continue without investigation.
> It could easily be due to mal-working CPU fan etc.pp., especially since it
> happened exactly while you stress-tested the machine.
>
> > 2. If not how do I go about debugging this kernel panic ?
>
> Read up on MCE debugging methods on Linux or so, that should hopefully help.
>
> Good luck!
>
> Andreas Mohr
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists