[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081215102450.GA9064@elf.ucw.cz>
Date: Mon, 15 Dec 2008 11:24:50 +0100
From: Pavel Machek <pavel@...e.cz>
To: Theodore Tso <tytso@....edu>, Chris Friesen <cfriesen@...tel.com>,
mikulas@...ax.karlin.mff.cuni.cz, clock@...ey.karlin.mff.cuni.cz,
kernel list <linux-kernel@...r.kernel.org>, aviro@...hat.com
Cc: Andrew Morton <akpm@...l.org>
Subject: [patch] Re: writing file to disk: not as easy as it looks
Hi!
> > > Heck, if you have a hiccup while writing an inode table block out to
> > > disk (for example a power failure at just the wrong time), so the
> > > memory (which is more voltage sensitive than hard drives) DMA's
> > > garbage which gets written to the inode table, you could lose a large
> > > number of adjacent inodes when garbage gets splatted over the inode
> > > table.
> >
> > Ok, "memory failed before disk" is ... bad hardware.
>
> It's PC class hardware. Live with it. Back when SGI made their own
> hardware, they noticed this problem, and so they wired up their SGI
> machines with powerfail interrupts, and extra big capacitors in
> their
Seems like bad hardware is very common indeed. Anyway, I guess it
would be fair to document what ext3 expects from disk subsystem for
safe operation. Does that summary sound correct/fair?
Signed-off-by: Pavel Machek <pavel@...e.cz>
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 9dd2a3b..3855fbd 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -188,6 +188,34 @@ mke2fs: create a ext3 partition with th
debugfs: ext2 and ext3 file system debugger.
ext2online: online (mounted) ext2 and ext3 filesystem resizer
+Requirements
+============
+
+Ext3 expects disk/storage subsystem to behave sanely. On sanely
+behaving disk subsystem, data that have been successfully synced will
+stay on the disk. Sane means:
+
+* writes to media never fail. Even if disk returns error condition during
+ write, ext3 can't handle that correctly, because success on fsync was already
+ returned when data hit the journal.
+
+ (Fortunately writes failing are very uncommon on disks, as they
+ have spare sectors they use when write fails.)
+
+* either whole sector is correctly written or nothing is written during
+ powerfail.
+
+ (Unfortuantely, all the cheap USB/SD flash cards I seen do behave
+ like this, and are unsuitable for ext3. Because RAM tends to fail
+ faster than rest of system during powerfail, special hw killing
+ DMA transfers may be neccessary. Not sure how common that problem
+ is on generic PC machines).
+
+* either write caching is disabled, or hw can do barriers and they are enabled.
+
+ (Note that barriers are disabled by default, use "barrier=1"
+ mount option after making sure hw can support them).
+
References
==========
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists