[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060915182915.GR4610@chain.digitalkingdom.org>
Date: Fri, 15 Sep 2006 11:29:15 -0700
From: Robin Lee Powell <rlpowell@...italkingdom.org>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Same MCE on 4 working machines (was Re: Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM)
On Fri, Sep 15, 2006 at 12:45:42PM +0100, Alan Cox wrote:
> Ar Iau, 2006-09-14 am 12:05 -0700, ysgrifennodd Robin Lee Powell:
> > NET: Registered protocol family 16
> > CPU 0: Machine Check Exception: 7 Bank 3: b40000000000083b
> > RIP 10:<ffffffff8023a44c> {pci_conf1_read+0xac/0xe0}
> > TSC d189cea ADDR fdfc000cfe
>
> We went to do a PCI configuration cycle and your box blew up.
> Thats pretty clear. Could be down to the various changes in how we
> do PCI accesses tripping up a problem box, or triggering a bug.
*nod* I'm totally on the fence about that; the company that made
these boxes (Penguin Computing) seems to have some clue issues, and
the motherboard is an Arima (sp?) HDAMA v2, which I gather is one of
the very earliest SMP Opteron boards.
Note that with the answers I give below I'm using the kernel that
hangs at:
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
for the first CPU, and that doesn't generate an MCE unless I use
acpi=off, so I'll be doing each option twice (once with just the
option you gave, and once with it plus acpi=off).
This is my 2.6.17.11 kernel; the Debian 2.6.8-12 kernel gets to the
MCE without any options; dunno why yet.
The MCE this kernel gives is:
HARDWARE ERROR
CPU 0: Machine Check Exception: 7 Bank 3: b40000000000083b
RIP 10:<ffffffff80308e7e> {pci_conf1_read+0xbe/0xf0}
TSC 1a0c706340 ADDR fdfc000cfc
This is not a software problem!
Run through mcelog --ascii to decode and contact your hardware vendor
Kernel panic - not syncing: Uncorrected machine check
> See what effect
>
> pci=bios
No effect.
> pci=conf1
No effect.
> pci=conf2
No effect without acpi=off.
With acpi=off, it gets rather farther before apparently failing to
talk the 3-ware card:
- ----------------------
Brought up 2 CPUs
testing NMI watchdog ... OK.
migration_cost=629
NET: Registered protocol family 16
ACPI: Subsystem revision 20060127
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
SCSI subsystem initialized
PCI: System does not support PCI
PCI: System does not support PCI
PCI-DMA: Disabling AGP.
PCI-DMA: More than 4GB of RAM and no IOMMU
PCI-DMA: 32bit PCI IO may malfunction.
PCI-DMA: Disabling IOMMU.
WARNING more than 4GB of memory but IOMMU not available.
WARNING 32bit PCI may malfunction.
NET: Registered protocol family 2
IP route cache hash table entries: 524288 (order: 10, 4194304 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
vga16fb: mapped to 0xffff8100000a0000
Console: switching to colour frame buffer device 80x30
fb0: VGA16 VGA frame buffer device
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 50MHz system bus speed for PIO modes; override with idebus=xx
Adaptec aacraid driver (1.1-5[2409]-mh1)
3ware Storage Controller device driver for Linux v1.26.02.001.
3ware 9000 Storage Controller device driver for Linux v2.26.02.007.
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
ip_tables: (C) 2000-2006 Netfilter Core Team
TCP bic registered
NET: Registered protocol family 8
NET: Registered protocol family 20
VFS: Cannot open root device "sda2" or unknown-block(0,0)
Please append a correct "root=" boot option
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
- ------------------------
> pci=nommconf
No effect.
> pci=nomsi
No effect.
> have and report back.
>
> What drivers do you have enabled
I'm not completely certain I know what you're asking there, but I
think this answers it:
http://teddyb.org/~rlpowell/media/regular/lkml/2.6.8.11.non-bi.config.txt
> and what pci devices are present ?
http://teddyb.org/~rlpowell/media/regular/lkml/lspci_v.txt
-Robin
--
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
Proud Supporter of the Singularity Institute - http://singinst.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists