linux-ext4 - Re: [patch] ext2/3: document conditions when reliable operation is possible

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.0908280738140.6822@asgard.lang.hm>
Date:	Fri, 28 Aug 2009 07:46:42 -0700 (PDT)
From:	david@...g.hm
To:	David Woodhouse <dwmw2@...radead.org>
cc:	Theodore Tso <tytso@....edu>, Pavel Machek <pavel@....cz>,
	Ric Wheeler <rwheeler@...hat.com>,
	Florian Weimer <fweimer@....de>,
	Goswin von Brederlow <goswin-v-b@....de>,
	Rob Landley <rob@...dley.net>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: [patch] ext2/3: document conditions when reliable operation is
 possible

On Thu, 27 Aug 2009, David Woodhouse wrote:

> On Mon, 2009-08-24 at 20:08 -0400, Theodore Tso wrote:
>>
>> (It's worse with people using Digital SLR's shooting in raw mode,
>> since it can take upwards of 30 seconds or more to write out a 12-30MB
>> raw image, and if you eject at the wrong time, you can trash the
>> contents of the entire CF card; in the worst case, the Flash
>> Translation Layer data can get corrupted, and the card is completely
>> ruined; you can't even reformat it at the filesystem level, but have
>> to get a special Windows program from the CF manufacturer to --maybe--
>> reset the FTL layer.
>
> This just goes to show why having this "translation layer" done in
> firmware on the device itself is a _bad_ idea. We're much better off
> when we have full access to the underlying flash and the OS can actually
> see what's going on. That way, we can actually debug, fix and recover
> from such problems.
>
>>   Early CF cards were especially vulnerable to
>> this; more recent CF cards are better, but it's a known failure mode
>> of CF cards.)
>
> It's a known failure mode of _everything_ that uses flash to pretend to
> be a block device. As I see it, there are no SSD devices which don't
> lose data; there are only SSD devices which haven't lost your data
> _yet_.
>
> There's no fundamental reason why it should be this way; it just is.
>
> (I'm kind of hoping that the shiny new expensive ones that everyone's
> talking about right now, that I shouldn't really be slagging off, are
> actually OK. But they're still new, and I'm certainly not trusting them
> with my own data _quite_ yet.)

so what sort of test would be needed to identify if a device has this 
problem?

people can do ad-hoc tests by pulling the devices in use and then checking 
the entire device, but something better should be available.

it seems to me that there are two things needed to define the tests.

1. a predictable write load so that it's easy to detect data getting lose

2. some statistical analysis to decide how many device pulls are needed 
(under the write load defined in #1) to make the odds high that the 
problem will be revealed.

with this we could have people test various devices and report if the test 
detects unrelated data being lost (or businesses, and I think the tech 
hardware sites would jump into this given some sort of accepted test)

for USB devices there may be a way to use the power management functions 
to cut power to the device without requiring it to physically be pulled, 
if this is the case (even if this only works on some specific chipsets), 
it would drasticly speed up the testing

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html