lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 2 Dec 2008 11:37:20 -0500
From:	Theodore Tso <tytso@....edu>
To:	Pavel Machek <pavel@...e.cz>
Cc:	mikulas@...ax.karlin.mff.cuni.cz, clock@...ey.karlin.mff.cuni.cz,
	kernel list <linux-kernel@...r.kernel.org>, aviro@...hat.com
Subject: Re: writing file to disk: not as easy as it looks

On Tue, Dec 02, 2008 at 04:26:18PM +0100, Pavel Machek wrote:
> > I can understand why you might want to fsync the containing directory
> > to make sure the directory entry got written to disk --- but if you're
> > that paranoid, many modern filesystems use some kind of tree
> > structure
> 
> If I'm trying to write foo/bar/baz/file, and file/baz inodes/dentries
> are written to disk, but foo is not, file still will not be found
> under full name - and recovering it from lost&found is hard to do
> automatically.

Only if you've freshly created the foo/bar/baz directories...  If you
have, then yes, you'll need to sync each one.  Normally the paranoid
programs do this after each mkdir call, though.

For ext3/ext4, becaused of the entangled commit factor, fsync()'ing
the file is sufficient, but that's not something you can properly
count upon.

> If disk looses data after acknowledging the write, all hope is lost.
> Else I expect filesystem to preserve data I successfully synced.
> 
>      (In the b-tree split failed case I'd expect transaction commit to
>      fail because new data could not be weitten; at that point
>      disk+journal should still contain all the data needed for
>      recovery of synced/old files, right?)

Not necessarily.  For filesystems that do logical journalling (i.e.,
xfs, jfs, et. al), the only thing written in the journal is the
logical change (i.e., "new dir entry 'file_that_causes_the_node_split'").

The transaction commits *first*, and then the filesystem tries to
write update the filesystem with the change, and it's only then that
the write fails.  Data can very easily get lost.

Even for ext3/ext4 which is doing physical journalling, it's still the
case that the journal commits first, and it's only later when the
write happens that we write out the change.  If the disk fails some of
the writes, it's possible to lose data, especially if the two blocks
involved in the node split are far apart, and the write to the
existing old btree block fails.

> > What exactly are your requirements here, and what are you trying to
> > do?  What are you worried about?  Most MTA's are quite happy
> > settling
> 
> I'm trying to put my main filesystem on a SD card. hp2133 has only 4GB
> internal flash, so I got 32GB SDHC. Unfortunately, SD card on hp is
> very easy to eject by mistake.

So what you really want is some way of constantly flushing data to the
disk, probably after every single mkdir, every single close operation.
Of course, that has the tradeoff your flash card will get a lot of
extra wear.  I hate to say this, but have you considered something
like tape or velcro to secure the SD card?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ