[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0706211040390.31303@p34.internal.lan>
Date: Thu, 21 Jun 2007 10:40:50 -0400 (EDT)
From: Justin Piszcz <jpiszcz@...idpixels.com>
To: Mattias Wadenstein <maswan@....umu.se>
cc: Neil Brown <neilb@...e.de>, David Chinner <dgc@....com>,
Avi Kivity <avi@...o.co.il>, david@...g.hm,
linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org
Subject: Re: limits on raid
On Thu, 21 Jun 2007, Mattias Wadenstein wrote:
> On Thu, 21 Jun 2007, Neil Brown wrote:
>
>> I have that - apparently naive - idea that drives use strong checksum,
>> and will never return bad data, only good data or an error. If this
>> isn't right, then it would really help to understand what the cause of
>> other failures are before working out how to handle them....
>
> In theory, that's how storage should work. In practice, silent data
> corruption does happen. If not from the disks themselves, somewhere along the
> path of cables, controllers, drivers, buses, etc. If you add in fcal, you'll
> get even more sources of failure, but usually you can avoid SANs (if you care
> about your data).
>
> Well, here is a couple of the issues that I've seen myself:
>
> A hw-raid controller returning every 64th bit as 0, no matter what's on disk.
> With no error condition at all. (I've also heard from a collegue about this
> on every 64k, but not seen that myself.)
>
> An fcal switch occasionally resetting, garbling the blocks in transit with
> random data. Lost a few TB of user data that way.
>
> Add to this the random driver breakage that happens now and then. I've also
> had a few broken filesystems due to in-memory corruption due to bad ram, not
> sure there is much hope of fixing that though.
>
> Also, this presentation is pretty worrying on the frequency of silent data
> corruption:
>
> https://indico.desy.de/contributionDisplay.py?contribId=65&sessionId=42&confId=257
>
> /Mattias Wadenstein
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Very interesting slides/presentation, going to watch it shortly.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists