[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20080215145007.GA18341@janus>
Date: Fri, 15 Feb 2008 15:50:07 +0100
From: Frank van Maarseveen <frankvm@...nkvm.com>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Machine check exception with a kernel dependency
On Fri, Feb 15, 2008 at 01:22:41PM +0000, Alan Cox wrote:
> On Wed, 13 Feb 2008 17:25:28 +0100
> Frank van Maarseveen <frankvm@...nkvm.com> wrote:
>
> > On at least two Dell optiplex 755 systems with a Core 2 Duo I get
> >
> > Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0000000000000004
> > Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0000000000000005
> > Feb 13 15:14:01 inari Bank 0: b200004000000800
> > Feb 13 15:14:01 inari Bank 5: b200221024080400
> >
> > 2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
> > it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
> > didn't help either.
>
> If you run the MCE numbers through a decoder what do you get back ?
I've some trouble decoding these in a convincing way. mcelog --core2
--ascii reports "MCG status:RIPV MCIP" for 0000000000000005 and "MCG
status:MCIP" for 0000000000000004.
I've collected several Bank # output lines:
# text
---------------------------
26 Bank 0: b200004000000800
10 Bank 5: b200121014040400
8 Bank 5: b200121020080400
4 Bank 5: b200221010040400
4 Bank 5: b200221024080400
but mcelog expects lines of the format
CPU %u: Machine Check Exception: %16Lx Bank %d: %016Lx
(they got broken by netconsole) so I made these up:
CPU 1: Machine Check Exception: 0000000000000004 Bank 0: b200004000000800
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121014040400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121020080400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221010040400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221024080400
result:
CPU 1: Machine Check Exception: 0000000000000004 Bank 0: b200004000000800
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 0 MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Originated-request Generic Memory-access Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout)
STATUS b200004000000800 MCGSTATUS 4
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121014040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121014040400 MCGSTATUS 5
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121020080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121020080400 MCGSTATUS 5
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221010040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 5
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221024080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5
The problem also exists on an entirely different Xeon system with 4 cores:
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU X3210 @ 2.13GHz
stepping : 11
--
Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists