[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130622125604.GD4727@thunk.org>
Date: Sat, 22 Jun 2013 08:56:04 -0400
From: Theodore Ts'o <tytso@....edu>
To: "Sidorov, Andrei" <Andrei.Sidorov@...isi.com>
Cc: "Joseph D. Wagner" <joe@...ephdwagner.info>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
Ryan Lortie <desrt@...rt.ca>
Subject: Re: ext4 file replace guarantees
On Fri, Jun 21, 2013 at 09:49:26PM +0000, Sidorov, Andrei wrote:
> But there is no need to mount entire fs with data journalling mode.
> In fact I find per-file data journalling extremely useful. It would
> be even more useful if it allowed regular users to set journalling
> mode on specific file and there was some way to designate rewrite
> transaction boundaries (even 128k would cover a lot of
> small-but-important-file use cases).
Note that at the moment, the +j flag is only honored in nodelalloc
mode. Since delayed allocation is enabled by defalut the per-file
data journal flag is ignored. This is something that we could fix, in
theory. It would be possible to teach ext4_writepages how to allocate
the block(s) and write the data block(s) in the same journal
transaction --- but that functionality does not exist today.
So if you want to use the +j flag, you have to mount the file system
with the non-standard nodelalloc mount option. And that's actually
sufficient to be bug-for-bug compatible with ext3 in terms of the
commit of the transaction which contains the rename operation first
forcing the file out to disk first.
Although as both I and Dave Chinner have pointed out, it's a bad idea
for generic application to depend on file system implementation,
because we do reserve the right to change those implementation details
if it would help improve the file system's performance or reliability.
> As for now it is a best choice for app running with root privileges
> for rewriting files <= page size.
The best choice for an application rewriting files <= a single 4k
block is to use O_DIRECT to rewrite the contents of the file, using a
4k buffer which is zero padded. This is the most performant, uses the
fewest write cycles for a SSD, etc.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists