[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ACB3741.2030101@gmail.com>
Date: Tue, 06 Oct 2009 15:25:37 +0300
From: Harri Olin <harri.olin@...il.com>
To: Mark Lord <liml@....ca>
CC: Bernie Innocenti <bernie@...ewiz.org>, linux-ide@...r.kernel.org,
lkml <linux-kernel@...r.kernel.org>, sysadmin <sysadmin@....org>
Subject: Re: sata_mv 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040
Mark Lord wrote:
> Bernie Innocenti wrote:
>> The error in the subject appears in the console immediately followed bv
>> a hard freeze of the machine. The error occurs reproducibly on two
>> identical Opteron servers, each one equipped with two identical
>> controller cards:
>>
>> 03:04.0 SCSI storage controller: Marvell Technology Group Ltd.
>> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>> 03:06.0 SCSI storage controller: Marvell Technology Group Ltd.
>> MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
>>
>> We can trigger the problem within a few seconds by starting a
>> reconstruction on a drive hooked to port 4 (counting from 0) of the
>> second controller. Oddly, every other drive works reliably and the
>> faulty drive works if we connect it to, for example, port 4 of the first
>> controller.
>>
>> Tested with Debian kernels 2.6.26-19 and 2.6.30-8. Let me know if
>> further details are needed.
> ..
>> 0000:03:06.0: PCI ERROR; PCI IRQ cause=0x30000040..
> ..
>
> 0x30000040 here means "MRdPerr":
> "bad data parity detected during PCI master read".
>
> Which means there that a data parity error happened
> during outgoing data transfer on the PCI-X bus.
> This could happen due to noise on the bus,
> dying capacitors, or (?) bad RAM (not sure about the last one).
>
I have heard same thing happened with same kind of configuration, using
Supermicro H8DME-2 motherboard, Opteron 2378 CPU.
Even the controllers were on same slots.
My initial suspicion was that the motherboard does not drop the PCI-X
bus frequency to 100MHz and drives the bus at 133MHz even though there
are 2 controllers connected. Proposed fix was to move the other
controller to other bus, as the H8DME-2 has four PCI-X slots, 2x100MHz
and 2x133MHz, but I haven't yet heard back if it helped.
Even the kernel was same - latest Debian distribution kernel. Might be
worthwile to try using vanilla kernel.org kernel if possible.
I have at home two 6081 controllers at same bus but at 100MHz and no
problems yet.
--
Harri.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists