[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070401030315.GA24080@jim.sh>
Date: Sat, 31 Mar 2007 20:03:16 -0700
From: Jim Paris <jim@...n.com>
To: Aaron Lehmann <aaronl@...elus.com>
Cc: linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org
Subject: Re: Silent corruption on AMD64
Aaron Lehmann wrote:
> I discovered a reproducible way of causing silent file corruption.
...
> 1. Heavy Ethernet load (nc remotehost < /dev/zero)
> 2. Heavy disk write load on any non-sata_sil drive (cat /dev/zero > /path)
> 3. Heavy disk read load on any other drive (tar c /path | cat > /dev/null)
Since it shows up under heavy load that includes unrelated devices, I
think ruling out hardware problems is important. Some suggestions:
- Use mcelog to see if you're getting any machine check exceptions
that would indicate hardware error: http://freshmeat.net/projects/mcelog/
- Use the edac module to turn on pci parity and memory error checks:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/drivers/edac/edac.txt
- Run memtest86+ for several loops to make sure your RAM is ok
- Try moving the SiI card to a different slot
- Try running the SATA drives from a separate power supply
- Move disks and cables around to see whether the problem follows the
disks, the cables, or the controllers
- Try enabling the "spread spectrum" clock option in your BIOS to
reduce EMI
-jim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists