lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090325215016.GP32307@mit.edu>
Date:	Wed, 25 Mar 2009 17:50:16 -0400
From:	Theodore Tso <tytso@....edu>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Arjan van de Ven <arjan@...radead.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>,
	Jens Axboe <jens.axboe@...cle.com>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Wed, Mar 25, 2009 at 03:48:51PM -0400, Christoph Hellwig wrote:
> On Wed, Mar 25, 2009 at 02:58:24PM -0400, Theodore Tso wrote:
> > omits the fsync().  So with ext4 we has workarounds that start pushing
> > out the data blocks in the for replace-via-rename and
> > replace-via-truncate cases, while XFS will do an implied fsync for
> > replace-via-truncate only, and btrfs will do an implied fsync for
> > replace-via-rename only.
> 
> The XFS one and the ext4 one that I saw only start an _asynchronous_
> writeout.  Which is not an implied fsync but snake oil to make the
> most common complaints go away without providing hard guarantees.

It actually does the right thing for ext4, because once we allocate
the blocks, the default data=ordered mode means that we flush the
datablocks before we execute the commit.  Hence, in the case of
open/write/close/rename, the rename will trigger an async writeout,
but before the commit block is actually written, we'll have flushed
out the data blocks.

I was under the impression that XFS was doing a synchronous fsync
before allowing the close() return, but all it is triggering an async
writeout, then yes, your concern is correct.  The bigger problem from
my perspective is that XFS is only doing this for the truncate case,
and (from what I've been told) not for the rename case.  The truncate
is fundamentally racy and application writers that don't do this
definitely don't deserve our solicitude, IMHO.  But people who do
open/write/close/rename, and omit the fsync before the rename, are at
least somewhat more deserving for some kind of workaround than the
idiots that do open/truncate/write/close.

> IFF we want to go down this route we should better provide strong
> guranteed semantics and document the propery.  And of course implement
> it consistently on all native filesystems.

That's something we should talk about at LSF.  I'm not all that eager
(or happy) about doing this, but I think that, given that the
application writers massively outnumber us, we are going to be bullied
into it.

> Note that the rename for atomic commits trick originated in mail severs
> which always did the proper fsync.  When the word spread into the
> desktop world it looks like this wisdom got lost.

Yep, agreed.

To be fair, though, one problem which Matthew Garrett has pointed out
is that if lots of applications issue fsync(), it will have the
tendency to wake up the hard drive a lot, and do a real number on
power utilization.  I believe the right solution for this is an
extension to laptop mode which synchronizes the filesystem at a clean
point, and then which suppresses fsync()'s until the hard drive wakes
up, at which point it should flush all dirty data to the drive, and
then freezes writes to the disk again.  Presumably that should be OK,
because who are using laptop mode are inherently trading off a certain
amount of safety for power savings; but then other people who want to
run a mysql server on a laptop get cranky, and then if we start
implementing ways that applications can exempt themselves from the
fsync() suppression, the complexity level starts rising.

This is a pretty complicated problem....  if people want to mount the
filesystem with the sync mount option, sure, but when people want
safety, speed, efficiency, power savings, *and* they want to use
crappy proprietary device drivers that crash if you look at them
funny, *and* be solicitous to application writers that rewrite
hundreds of files on desktop startup (even though it's not clear *why*
it is useful for KDE or GNOME to rewrite hundreds of files when the
user logs in and initializes the desktop), something has got to give.

There's nothing to trade off, other than the sanity of the file system
maintainers.  (But that's OK, Linus has called us crazy already.  :-/)

	      	   	      	    	       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ