lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 24 Mar 2009 14:45:49 -0400
From:	Theodore Tso <tytso@....edu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Alan Cox <alan@...rguk.ukuu.org.uk>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>,
	Jens Axboe <jens.axboe@...cle.com>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Tue, Mar 24, 2009 at 10:55:40AM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 24 Mar 2009, Theodore Tso wrote:
> > 
> > Try ext4, I think you'll like it.  :-)
> > 
> > Failing that, data=writeback for single-user machines is probably your
> > best bet.
> 
> Isn't that the same fix? ext4 just defaults to the crappy "writeback" 
> behavior, which is insane.

Technically, it's not data=writeback.  It's more like XFS's delayed
allocation; I've added workarounds so that files that which are
replaced via truncate or rename get pushed out right away, which
should solve most of the problems involved with files becoming
zero-length after a system crash.

> Sure, it makes things _much_ smoother, since now the actual data is no 
> longer in the critical path for any journal writes, but anybody who thinks 
> that's a solution is just incompetent. 
> 
> We might as well go back to ext2 then. If your data gets written out long 
> after the metadata hit the disk, you are going to hit all kinds of bad 
> issues if the machine ever goes down.

With ext2 after a system crash you need to run fsck.  With ext4, fsck
isn't an issue, but if the application doesn't use fsync(), yes,
there's no guarantee (other than the workarounds for
replace-via-truncate and replace-via-rename), but there's plenty of
prior history that says that applications that care about data hitting
the disk should use fsync().  Otherwise, it will get spread out over a
few minutes; and for some files, that really won't make a difference.

For precious files, applications that use fsync() will be safe ---
otherwise, even with ext3, you can end up losing the contents of the
file if you crash right before 5 second commit window.  At least back
in the days when people were proud of their Linux systems having 2-3
year uptimes, and where jiffies could actually wrap from time to time,
the difference between 5 seconds and 3 minutes really wasn't that big
of a deal.  People who really care about this can turn off delayed
allocation with the nodelalloc mount option.  Of course then they will
have the ext3 slower fsync() problem.

You are right that data=writeback and delayed allocation do both mean
that data can get pushed out much later than the metadata.  But that's
allowed by POSIX, and it does give some very nice performance
benefits.

With either data=writeback or delayed allocation, we can also adjust
the default commit interval and the writeback timer settings; if we
say, change the default commit interval to be 30 seconds, and change
the writeback expire interval to be 15 seconds, it will also smooth
out the writes significantly.  So that's yet another solution, with a
different set of tradeoffs.  

Depending on the set of applications someone is running on their
system, running and the reliability of their hardware/power/system in
general, different tradeoffs will be more or less appropriate for the
system administrator in question.

							- Ted


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ