lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200909021800.51096.rob@landley.net>
Date:	Wed, 2 Sep 2009 18:00:48 -0500
From:	Rob Landley <rob@...dley.net>
To:	Ric Wheeler <rwheeler@...hat.com>
Cc:	Pavel Machek <pavel@....cz>, david@...g.hm,
	Theodore Tso <tytso@....edu>, Florian Weimer <fweimer@....de>,
	Goswin von Brederlow <goswin-v-b@....de>,
	kernel list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
	rdunlap@...otime.net, linux-doc@...r.kernel.org,
	linux-ext4@...r.kernel.org, corbet@....net
Subject: Re: [testcase] test your fs/storage stack (was Re: [patch] ext2/3: document conditions when reliable operation is possible)

On Wednesday 02 September 2009 15:42:19 Ric Wheeler wrote:
> On 09/02/2009 04:12 PM, Pavel Machek wrote:
> >>>> people aren't objecting to better documentation, they are objecting to
> >>>> misleading documentation.
> >>>
> >>> Actually Ric is. He's trying hard to make RAID5 look better than it
> >>> really is.
> >>
> >> I object to misleading and dangerous documentation that you have
> >> proposed. I spend a lot of time working in data integrity, talking and
> >> writing about it so I care deeply that we don't misinform people.
> >
> > Yes, truth is dangerous. To vendors selling crap products.
>
> Pavel, you have no information and an attitude of not wanting to listen to
> anyone who has real experience or facts. Not just me, but also Ted and
> others.
>
> Totally pointless to reply to you further.

For the record, I've been able to follow Pavel's arguments, and I've been able 
to follow Ted's arguments.  But as far as I can tell, you're arguing about a 
different topic than the rest of us.

There's a difference between:

A) This filesystem was corrupted because the underlying hardware is permanently 
damaged, no longer functioning as it did when it was new, and never will 
again.

B) We had a transient glitch that ate the filesystem.  The underlying hardware 
is as good as new, but our data is gone.

You can argue about whether or not "new" was ever any good, but Linux has run 
on PC-class hardware from day 1.  Sure PC-class hardware remains crap in many 
different ways, but this is not a _new_ problem.  Refusing to work around what 
people actually _have_ and insisting we get a better class of user instead 
_is_ a new problem, kind of a disturbing one.

USB keys are the modern successor to floppy drives, and even now 
Documentation/blockdev/floppy.txt is still full of some of the torturous 
workarounds implemented for that over the past 2 decades.  The hardware 
existed, and instead of turning up their nose at it they made it work as best 
they could.

Perhaps what's needed for the flash thing is a userspace package, the way 
mdutils made floppies a lot more usable than the kernel managed at the time.  
For the flash problem perhaps some FUSE thing a bit like mtdblock might be 
nice, a translation layer remapping an arbitrary underlying block device into 
larger granularity chunks and being sure to do the "write the new one before 
you erase the old one" trick that so many hardware-only flash devices _don't_, 
and then maybe even use Pavel's crash tool to figure out the write granularity 
of various sticks and ship it with a whitelist people can email updates to so 
we don't have to guess large.  (Pressure on the USB vendors to give us a "raw 
view" extension bypassing the "pretend to be a hard drive, with remapping" 
hardware in future devices would be nice too, but won't help any of the 
hardware out in the field.  I'm not sure that block remapping wouldn't screw up 
_this_ approach either, but it's an example of something that culd be 
_tried_.)

However, thinking about how to _fix_ a problem is predicated on acknowledging 
that there actually _is_ a problem.  "The hardware is not physically damaged 
but your data was lost" sounds to me like a software problem, and thus 
something software could at least _attempt_ to address.  "There's millions of 
'em, Linux can't cope" doesn't seem like a useful approach.

I already addressed the software raid thing last post.

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ