[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18581.6873.353028.695909@stoffel.org>
Date: Sat, 2 Aug 2008 22:41:29 -0400
From: "John Stoffel" <john@...ffel.org>
To: linasvepstas@...il.com
Cc: "John Stoffel" <john@...ffel.org>,
"Alistair John Strachan" <alistair@...zero.co.uk>,
linux-kernel@...r.kernel.org
Subject: Re: amd64 sata_nv (massive) memory corruption
>>>>> "Linas" == Linas Vepstas <linasvepstas@...il.com> writes:
Linas> 2008/8/2 John Stoffel <john@...ffel.org>:
>>>>>>> "Linas" == Linas Vepstas <linasvepstas@...il.com> writes:
>>
Linas> 2008/8/1 Alistair John Strachan <alistair@...zero.co.uk>:
>>>> On Friday 01 August 2008 18:30:34 Linas Vepstas wrote:
>>>>> Hi,
>>>>>
>>>>> I'm seeing strong, easily reproducible (and silent) corruption on a
>>>>> sata-attached
>>>>> disk drive on an amd64 board. It might be the disk itself, but I
>>>>> doubt it; googling
>>>>> suggests that its somehow iommu-related but I cannot confirm this.
>>
>> Can you post the output of dmesg after a boot, so we can see which
>> driver is being used? I assume the new Libata stuff, but maybe you
>> can also turn on debugging in there as well. Stuff like SCSI_DEBUG
>> (in the SCSI menus) might show us more details here.
>>
>> Also, have you tried a new SATA cable by any chance? That's obviously
>> the cheaper path than getting a new disk...
Linas> I took the problematic hard drive (and its cable) to another
Linas> computer with sata ports on it, and ran my
Linas> file-copy/compare/fsck tests there, and saw no problems; so the
Linas> drive itself and its cable get a clean bill of health.
Well that's a good sign.
Linas> Then, rather stupidly, I flashed the latest BIOS for the
Linas> motherboard and now have a dead motherboard (it hangs on its
Linas> way through BIOS, well before the bootloader.) So I'm off to
Linas> buy a new mobo today.
Awww fuckies. Sorry to suggest this path to you. You might be able
to get it back by clearing the CMOS as well. And hey, it could have
been a bad Mobo in the end too.
Linas> I'll send the dmesg from the older boots later today, if all
Linas> goes well. I'm pretty sure I had the new libata on, and the
Linas> old off -- but its possible that the .config somehow managed to
Linas> pull in parts of the old libata code anyway. I say this
Linas> because, besides the SATA, the blown motherboard had an IDE
Linas> connector in use, and I also had another PCI IDE card plugged
Linas> in and in use. I'm imagining that perhaps the PCI IDE .config
Linas> might have pulled in old code, maybe via header file, and thus
Linas> mangled some lock that the sata side was using. Just a wild
Linas> guess. -- Most people on this mobo hadn't seen problems, and
Linas> unlike most people, I had the PCI IDE card in it.
Hmmm... I've sorta run into this, but on my old system where I have
the following: Adaptec SCSI built in (boot drive), LSI scsi PCI card
(tape library and drives), PATA on board (for DVD), SIL SATA PCI card
(data disks), HighPoint PCI card, two scratch disks. Total pain in
the butt figuring out the right mix of libATA SATA/PATA drivers vs the
old plain PATA drivers. Once I got it working with pretty much all
/dev/sd* devices, I just leave it alone. :] Oh yeah, an 8 port
serial card and a Gigabit ethernet card as well. It's full to the
gills.
My new system is mostly my desktop, not my server, so I haven't pushed
it as hard bus wise.
Good luck, sorry I can't help directly. Do you want to see my dmesg
output as a comparision?
John
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists