lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 28 Mar 2009 21:18:51 -0400
From:	Jeff Garzik <jeff@...zik.org>
To:	Mark Lord <lkml@....ca>
CC:	Stefan Richter <stefanr@...6.in-berlin.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Matthew Garrett <mjg59@...f.ucam.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Theodore Tso <tytso@....edu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29

Mark Lord wrote:
> The better solution seems to be the rather obvious one:
> 
>   the filesystem should commit data to disk before altering metadata.
> 
> Much easier and more reliable to centralize it there, rather than
> rely (falsely) upon thousands of programs each performing numerous
> performance-killing fsync's.

Firstly, the FS data/metadata write-out order says nothing about when 
the write-out is started by the OS.  It only implies consistency in the 
face of a crash during write-out.  Hooray for BSD soft-updates.

If the write-out is started immediately during or after write(2), 
congratulations, you are on your way to reinventing synchronous writes.

If the write-out does not start immediately, then you have a 
many-seconds window for data loss.  And it should be self-evident that 
userland application writers will have some situations where design 
requirements dictate minimizing or eliminating that window.


Secondly, this email sub-thread is not talking about thousands of 
programs, it is talking about Firefox behavior.  Firefox is a multi-OS 
portable application that has a design requirement that user data must 
be protected against crashes.  (same concept as your word processor's 
auto-save feature)

The author of such a portable application must ensure their app saves 
data against Windows Vista kernel crashes, HPUX kernel crashes, OS X 
window system crashes, X11 window system crashes, application crashes, etc.

Can a portable app really rely on what Linux kernel hackers think the 
underlying filesystem _should_ do?

No, it is either (a) not going to care at all, or (b) uses fsync(2) or 
FlushFileBuffers() because if guarantees provided across the OS 
spectrum, in light of the myriad OS filesystem caching, flushing, and 
ordering algorithms.



Was the BSD soft-updates idea of FS data-before-metadata a good one? 
Yes.  Obviously.

It is the cornerstone of every SANE journalling-esque database or 
filesystem out there -- don't leave a window where your metadata is 
inconsistent.  "Duh" :)

But that says nothing about when a userland app's design requirements 
include ordered writes+flushes of its own application data.  That is the 
common case when a userland app like Firefox uses a transactional 
database such as sqlite or db4.

Thus it is the height of silliness to think that FS data/metadata 
write-out order permits elimination of fsync(2) for the class of 
application that must care about ordered writes/flushes of its own 
application data.

That upstream sqlite replaced fsync(2) with fdatasync(2) makes it 
obvious that FS data/metadata write-out order is irrelevant to Firefox.

The issue with transactional databases is more simply a design tradeoff 
-- level of fsync punishment versus performance etc.  Tweaking the OS 
filesystem doesn't help at all with those design choices.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ