lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.GSO.4.64.0706211417180.15647@montezuma.acc.umu.se>
Date:	Thu, 21 Jun 2007 14:40:44 +0200 (MEST)
From:	Mattias Wadenstein <maswan@....umu.se>
To:	Neil Brown <neilb@...e.de>
cc:	David Chinner <dgc@....com>, Avi Kivity <avi@...o.co.il>,
	david@...g.hm, linux-kernel@...r.kernel.org,
	linux-raid@...r.kernel.org
Subject: Re: limits on raid

On Thu, 21 Jun 2007, Neil Brown wrote:

> I have that - apparently naive - idea that drives use strong checksum,
> and will never return bad data, only good data or an error.  If this
> isn't right, then it would really help to understand what the cause of
> other failures are before working out how to handle them....

In theory, that's how storage should work. In practice, silent data 
corruption does happen. If not from the disks themselves, somewhere along 
the path of cables, controllers, drivers, buses, etc. If you add in fcal, 
you'll get even more sources of failure, but usually you can avoid SANs 
(if you care about your data).

Well, here is a couple of the issues that I've seen myself:

A hw-raid controller returning every 64th bit as 0, no matter what's on 
disk. With no error condition at all. (I've also heard from a collegue 
about this on every 64k, but not seen that myself.)

An fcal switch occasionally resetting, garbling the blocks in transit with 
random data. Lost a few TB of user data that way.

Add to this the random driver breakage that happens now and then. I've 
also had a few broken filesystems due to in-memory corruption due to bad 
ram, not sure there is much hope of fixing that though.

Also, this presentation is pretty worrying on the frequency of silent data 
corruption:

https://indico.desy.de/contributionDisplay.py?contribId=65&sessionId=42&confId=257

/Mattias Wadenstein
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ