lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100216141854.GT5337@thunk.org>
Date:	Tue, 16 Feb 2010 09:18:54 -0500
From:	tytso@....edu
To:	Jan Kara <jack@...e.cz>
Cc:	Kailas Joshi <kailas.joshi@...il.com>, linux-ext4@...r.kernel.org,
	Jiaying Zhang <jiayingz@...gle.com>
Subject: Re: Help on Implementation of EXT3 type Ordered Mode in EXT4

On Tue, Feb 16, 2010 at 02:10:39PM +0100, Jan Kara wrote:
>   Actually, stalling on a transaction in LOCKED state does have a negative
> impact on the filesystem performance. But it's hard to avoid it. The
> transaction is in LOCKED state while we've decided it needs a commit but
> there are still tasks which have handle to it and are adding new metadata
> buffers to it. So this transaction is effectively still running and we
> cannot start a next transaction because then we'd have two running
> transactions and the journalling logic isn't able to handle that.

This is also why we try to avoid staying in LOCKED state for very
long.... and why increasing the journal size can help performance
(since if we get ourselves into trouble where are forced to do a
journal checkpoint, we can end up stalling all file system updates for
a non-trivial amount of time).

So changes that increase the amount of time that we spend in LOCKED
are going to be really bad, especially if you have one thread which is
frequently calling fsync() (for example, like Firefox, which can be
*very* fsync() happy) and another thread which is doing lots of file
creates and deletes.  Each fsync() will force a transaction commit,
and if you have to stop all transaction updates while the delayed
allocation blocks are getting resolved, life can really get bad.

This is why, ultimately, we really need to distinguish between files
where we might not care when they get written to disk (i.e., object
files being created by the compiler, ISO files being downloaded from
the web since we can always restart them after the hopefully rare
crash --- unless you're using crappy video drivers, of course) from
files written by buggy applications which are precious and yet where
the application writer didn't bother to use fsync().

Maybe something we ought to consider is doing things both ways.  Maybe
we should have a way for applications to indicate they have been
audited and any precious files will be properly fsync()'ed.  This
could be done via two process personality flags; one which is
inherited across an exec, and which which isn't.  (We need this so
that jobs being fired out of make can be properly exempted from
calling fsync(), even if they are using programs like sort, or shell
redirections, where the coreutils authors don't know whether the files
they are writing are precious or not, and thus whether they should be
fsync'ed.)

These flags would be used to exempt processes from a mount option
which could be set by people who are nervous about not trusting their
application writers, which would force an fsync at every file close
(except for those processes which have these process personality flags
set).  People who are more confident about having a stable set of
kernel drivers (and/or who are running servers where they have UPS's
and where they aren't using crappy desktop applications that seem to
be the most likely to not properly call fsync for precious files) can
simply avoid using this mount option, but we can give users and system
administrators a choice.

Maybe, just for those whiners at Phoronix, we can give them an mount
option where applications which have this flag set will get delayed
allocation, and applications which don't get their files written with
O_SYNC.  :-)

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ