linux-kernel - Re: amd64 sata_nv (massive) memory corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3ae3aa420808021501k2e871dc0y344dd7f9a7b80614@mail.gmail.com>
Date:	Sat, 2 Aug 2008 17:01:46 -0500
From:	"Linas Vepstas" <linasvepstas@...il.com>
To:	"John Stoffel" <john@...ffel.org>
Cc:	"Alistair John Strachan" <alistair@...zero.co.uk>,
	linux-kernel@...r.kernel.org
Subject: Re: amd64 sata_nv (massive) memory corruption

2008/8/2 John Stoffel <john@...ffel.org>:
>>>>>> "Linas" == Linas Vepstas <linasvepstas@...il.com> writes:
>
> Linas> 2008/8/1 Alistair John Strachan <alistair@...zero.co.uk>:
>>> On Friday 01 August 2008 18:30:34 Linas Vepstas wrote:
>>>> Hi,
>>>>
>>>> I'm seeing strong, easily reproducible (and silent) corruption on a
>>>> sata-attached
>>>> disk drive on an amd64 board.  It might be the disk itself, but I
>>>> doubt it; googling
>>>> suggests that its somehow iommu-related but I cannot confirm this.
>
> Can you post the output of dmesg after a boot, so we can see which
> driver is being used?  I assume the new Libata stuff, but maybe you
> can also turn on debugging in there as well.  Stuff like SCSI_DEBUG
> (in the SCSI menus) might show us more details here.
>
> Also, have you tried a new SATA cable by any chance?  That's obviously
> the cheaper path than getting a new disk...

I took the problematic hard drive (and its cable) to another computer
with sata ports on it,  and ran my file-copy/compare/fsck tests there,
and saw no problems; so the drive itself and its cable get a clean bill
of health.

Then, rather stupidly, I flashed the latest BIOS for the motherboard
and now have a dead motherboard (it hangs on its way through BIOS,
well before the bootloader.)  So I'm off to buy a new mobo today.

I'll send the dmesg from the older boots later today, if all goes well.
I'm pretty sure I had the new libata on, and the old off -- but its
possible that the .config somehow managed to pull in parts of the
old libata code anyway. I say this because, besides the SATA, the
blown motherboard had an IDE connector in use, and I also had
another PCI IDE card plugged in and in use. I'm imagining that
perhaps the PCI IDE .config might have pulled in old code, maybe
via header file, and thus mangled some lock that the sata side
was using. Just a wild guess.  -- Most people on this mobo hadn't
seen problems, and unlike most people, I had the PCI IDE card
in it.

--linas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/