[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <19f34abd0901251551j17231d82n306d2d5bfb26072a@mail.gmail.com>
Date: Mon, 26 Jan 2009 00:51:54 +0100
From: Vegard Nossum <vegard.nossum@...il.com>
To: Valdis.Kletnieks@...edu
Cc: Zdenek Kabelac <zdenek.kabelac@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"Maciej W. Rozycki" <macro@...ux-mips.org>
Subject: Re: MCE error log
On Tue, Jan 6, 2009 at 7:42 PM, <Valdis.Kletnieks@...edu> wrote:
> On Tue, 06 Jan 2009 14:00:03 +0100, Zdenek Kabelac said:
>
>> CPU 1 BANK 128 TSC 57976afd
>
>> I could only see bank0ctl ... bank5ctl - so where is bank 128 ?
>
> I've had bank 128 reported before. Turned out it was for thermal events caused
> by dust bunnies clogging a cooling vent. I never did find an official
> statement that's what 128 is for, but I did find a bunch of hints....
>
> What does lm_sensors say the CPU temp is sitting at?
I get this also:
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 THERMAL EVENT TSC dc963a087
Processor core below trip temperature. Throttling disabled
STATUS 882c0100 MCGSTATUS 0
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 THERMAL EVENT TSC dc970c0d0
Processor core below trip temperature. Throttling disabled
STATUS 882d0200 MCGSTATUS 0
and in kernel log:
Machine check events logged
CPU0: Temperature/speed normal
CPU1: Temperature/speed normal
This is happening since I installed a x86_64 kernel instead of 32-bit.
Maybe this explains those weird (never fatal) APIC errors I always
used to get before (error 40, invalid vector received AFAIR)? In any
case, the APIC errors are not to be seen now, and the frequency of the
MCEs are about that of the APIC errors. What I can say is that it
seems they appear sooner when there is a lot of interrupts, e.g. disk
or network activity. What is the correlation?
Temperature seems completely normal whenever it happens:
# sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +58.0°C (high = +100.0°C, crit = +100.0°C)
coretemp-isa-0001
Adapter: ISA adapter
Core 1: +59.0°C (high = +100.0°C, crit = +100.0°C)
Anyway, system works fine, so it's not much to worry about. But I am curious...
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists