[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2D8D1A30-C092-4163-B47A-BCEDACE536A3@boeing.com>
Date: Mon, 27 Jun 2011 23:21:17 -0500
From: "Moffett, Kyle D" <Kyle.D.Moffett@...ing.com>
To: "Ted Ts'o" <tytso@....edu>
CC: Lukas Czerner <lczerner@...hat.com>, Jan Kara <jack@...e.cz>,
Sean Ryle <seanbo@...il.com>,
"615998@...s.debian.org" <615998@...s.debian.org>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
Sachin Sant <sachinp@...ibm.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel
BUG at fs/jbd2/commit.c:534" from Postfix on ext4
On Jun 27, 2011, at 12:01, Ted Ts'o wrote:
> On Mon, Jun 27, 2011 at 05:30:11PM +0200, Lukas Czerner wrote:
>>> I've found some. So although data=journal users are minority, there are
>>> some. That being said I agree with you we should do something about it
>>> - either state that we want to fully support data=journal - and then we
>>> should really do better with testing it or deprecate it and remove it
>>> (which would save us some complications in the code).
>>>
>>> I would be slightly in favor of removing it (code simplicity, less options
>>> to configure for admin, less options to test for us, some users I've come
>>> across actually were not quite sure why they are using it - they just
>>> thought it looks safer).
>
> Hmm... FYI, I hope to be able to bring on line automated testing for
> ext4 later this summer (there's a testing person at Google is has
> signed up to work on setting this up as his 20% project). The test
> matrix that I have him included data=journal, so we will be getting
> better testing in the near future.
>
> At least historically, data=journalling was the *simpler* case, and
> was the first thing supported by ext4. (data=ordered required revoke
> handling which didn't land for six months or so). So I'm not really
> that convinced that removing really buys us that much code
> simplification.
>
> That being siad, it is true that data=journalled isn't necessarily
> faster. For heavy disk-bound workloads, it can be slower. So I can
> imagine adding some documentation that warns people not to use
> data=journal unless they really know what they are doing, but at least
> personally, I'm a bit reluctant to dispense with a bug report like
> this by saying, "oh, that feature should be deprecated".
I suppose I should chime in here, since I'm the one who (potentially
incorrectly) thinks I should be using data=journalled mode.
My basic impression is that the use of "data=journalled" can help
reduce the risk (slightly) of serious corruption to some kinds of
databases when the application does not provide appropriate syncs
or journalling on its own (IE: such as text-based Wiki database files).
Please correct me if this is horribly horribly wrong:
no journal:
Nothing is journalled
+ Very fast.
+ Works well for filesystems that are "mkfs"ed on every boot
- Have to fsck after every reboot
data=writeback:
Metadata is journalled, data (to allocated extents) may be written
before or after the metadata is updated with a new file size.
+ Fast (not as fast as unjournalled)
+ No need to "fsck" after a hard power-down
- A crash or power failure in the middle of a write could leave
old data on disk at the end of a file. If security labeling
such as SELinux is enabled, this could "contaminate" a file with
data from a deleted file that was at a higher sensitivity.
Log files (including binary database replication logs) may be
effectively corrupted as a result.
data=ordered:
Data appended to a file will be written before the metadata
extending the length of the file is written, and in certain cases
the data will be written before file renames (partial ordering),
but the data itself is unjournalled, and may be only partially
complete for updates.
+ Does not write data to the media twice
+ A crash or power failure will not leave old uninitialized data
in files.
- Data writes to files may only partially complete in the event
of a crash. No problems for logfiles, or self-journalled
application databases, but others may experience partial writes
in the event of a crash and need recovery.
data=journalled:
Data and metadata are both journalled, meaning that a given data
write will either complete or it will never occur, although the
precise ordering is not guaranteed. This also implies all of the
data<=>metadata guarantees of data=ordered.
+ Direct IO data writes are effectively "atomic", resulting in
less likelihood of data loss for application databases which do
not do their own journalling. This means that a power failure
or system crash will not result in a partially-complete write.
- Cached writes are not atomic
+ For small cached file writes (of only a few filesystem pages)
there is a good chance that kernel writeback will queue the
entire write as a single I/O and it will be "protected" as a
result. This helps reduce the chance of serious damage to some
text-based database files (such as those for some Wikis), but
is obviously not a guarantee.
- This writes all data to the block device twice (once to the FS
journal and once to the data blocks). This may be especially bad
for write-limited Flash-backed devices.
Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists