[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45786E58.5070308@citd.de>
Date: Thu, 07 Dec 2006 20:41:12 +0100
From: Matthias Schniedermeyer <ms@...d.de>
To: Alan Stern <stern@...land.harvard.edu>
Cc: linux-kernel@...r.kernel.org, usb-storage@...ts.one-eyed-alien.net
Subject: Re: [usb-storage] single bit errors on files stored on USB-HDDs
via USB2/usb_storage
Alan Stern wrote:
> On Wed, 6 Dec 2006, Matthias Schniedermeyer wrote:
>
>
>>Hi
>>
>>
>>I'm using a Bunch auf HDDs in USB-Enclosures for storing files.
>>(currently 38 HDD, with a total capacity of 9,5 TB of which 8,5 TB is used)
>>
>>After i realised about a year(!) ago that the files copied to the HDDs
>>sometimes aren't identical to the "original"-files i changed my
>>procedured so that each file is MD5 before and after and deleted/copied
>>again if an error is detected.
>>
>>My averate file size is about 1GB with files from about 400MB to 5000MB
>>I estimate the average error-rate at about one damaged file in about
>>10GB of data.
>>
>>I'm not sure and haven't checked if the files are wrongly written or
>>"only" wrongly read back as i delete the defective files and copy them
>>again.
>>
>>Today i copied a few files back and checked them against the stored MD5
>>sums and 5 files of 86 (each about 700 MB) had errors. So i copied the 5
>>files again. 4 of the files were OK after that and coping the last file
>>the third time also resulted in the correct MD5.
>>
>>This time i kept the defective files and used "vbindiff" to show me the
>>difference. Strangly in EVERY case the difference is a single bit in a
>>sequence of "0xff"-Bytes inside a block of varing bit-values that
>>changed a "0xff" into a "0xf7".
>>Also interesting is that each error is at a 0xXXXXXXX5-Position
>>
>>Attached is a file with 5 of the 6 differences named 1-5. Of each of the
>>5 2x3 lines-blocks the first 3 lines are the original the following 3
>>lines contain the error in the middle row 6th value.
>>
>>NEVER did i see any messages in syslog regarding erros or an aborting
>>program due to errors passed down from the kernel or something like that.
>
>
> This was almost certainly caused by hardware flaws in the USB interface
> chips of the enclosures. There's nothing the kernel can do about it
> because the errors aren't reported; all that happens is that incorrect
> data is sent to or from the drive.
So pretty much all ich can do is to pray that the errors don't corrupt
the Filesystem-Metadata (XFS).
So i should definetly consider writing me a "NO-FS" where the
"filesystem"-part is stored elsewhere and the HDD contains 100% content
(Minus a Dummy-MBR-Block for sector 0). On the plus side such a
filesystem won't have any overhead at all, but on the flipside you loose
pretty much the whole content if you lose the metadata. But i guess in
my case it would considerably lower the risk of loosing data.
Bis denn
--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists