lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090327032301.GN6239@mit.edu>
Date:	Thu, 26 Mar 2009 23:23:01 -0400
From:	Theodore Tso <tytso@....edu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

On Thu, Mar 26, 2009 at 06:03:15PM -0700, Linus Torvalds wrote:
> 
> Everybody accepts that if you've written a 20MB file and then call 
> "fsync()" on it, it's going to take a while. But when you've written a 2kB 
> file, and "fsync()" takes 20 seconds, because somebody else is just 
> writing normally, _that_ is a bug. And it is actually almost totally 
> unrelated to the whole 'dirty_limit' thing.

Yeah, well, it's caused by data=ordered, which is an ext3 unique
thing; no other filesystem (or operating system) has such a feature.
I'm beginning to wish we hadn't implemented it.  Yeah, it solved a
security problem (which delayed allocation also solves), but it
trained application programs to be careless about fsync(), and it's
caused us so many other problems, including the fsync() and unrelated
commit latency problems.

We are where we are, though, and people have been trained to think
they don't need fsync(), so we're going to have to deal with the
problem by having these implied fsync for cases like
replace-via-rename, and in addition to that, some kind of hueristic to
force out writes early to avoid these huge write latencies.  It would
be good to make it be autotuning it so that filesystems that don't do
ext3 data=ordered don't have to pay the price of having to force out
writes so aggressively early (since in some cases if the file
subsequently is deleted, we might be able to optimize out the write
altogether --- and that's good for SSD endurance).

	     	       		      	  	 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ