[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200908241611.10400.rob@landley.net>
Date: Mon, 24 Aug 2009 16:11:08 -0500
From: Rob Landley <rob@...dley.net>
To: Pavel Machek <pavel@....cz>
Cc: Goswin von Brederlow <goswin-v-b@....de>,
kernel list <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...l.org>, mtk.manpages@...il.com,
tytso@....edu, rdunlap@...otime.net, linux-doc@...r.kernel.org,
linux-ext4@...r.kernel.org
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible
On Monday 24 August 2009 04:31:43 Pavel Machek wrote:
> Running journaling filesystem such as ext3 over flashdisk or degraded
> RAID array is a bad idea: journaling guarantees no longer apply and
> you will get data corruption on powerfail.
>
> We can't solve it easily, but we should certainly warn the users. I
> actually lost data because I did not understand these limitations...
>
> Signed-off-by: Pavel Machek <pavel@....cz>
Acked-by: Rob Landley <rob@...dley.net>
With a couple comments:
> +* write caching is disabled. ext2 does not know how to issue barriers
> + as of 2.6.28. hdparm -W0 disables it on SATA disks.
It's coming up on 2.6.31, has it learned anything since or should that version
number be bumped?
> + (Thrash may get written into sectors during powerfail. And
> + ext3 handles this surprisingly well at least in the
> + catastrophic case of garbage getting written into the inode
> + table, since the journal replay often will "repair" the
> + garbage that was written into the filesystem metadata blocks.
> + It won't do a bit of good for the data blocks, of course
> + (unless you are using data=journal mode). But this means that
> + in fact, ext3 is more resistant to suriving failures to the
> + first problem (powerfail while writing can damage old data on
> + a failed write) but fortunately, hard drives generally don't
> + cause collateral damage on a failed write.
Possible rewording of this paragraph:
Ext3 handles trash getting written into sectors during powerfail
surprisingly well. It's not foolproof, but it is resilient. Incomplete
journal entries are ignored, and journal replay of complete entries will
often "repair" garbage written into the inode table. The data=journal
option extends this behavior to file and directory data blocks as well
(without which your dentries can still be badly corrupted by a power fail
during a write).
(I'm not entirely sure about that last bit, but clarifying it one way or the
other would be nice because I can't tell from reading it which it is. My
_guess_ is that directories are just treated as files with an attitude and an
extra cacheing layer...?)
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists