linux-ext4 - Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2D8D1A30-C092-4163-B47A-BCEDACE536A3@boeing.com>
Date:	Mon, 27 Jun 2011 23:21:17 -0500
From:	"Moffett, Kyle D" <Kyle.D.Moffett@...ing.com>
To:	"Ted Ts'o" <tytso@....edu>
CC:	Lukas Czerner <lczerner@...hat.com>, Jan Kara <jack@...e.cz>,
	Sean Ryle <seanbo@...il.com>,
	"615998@...s.debian.org" <615998@...s.debian.org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Sachin Sant <sachinp@...ibm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel
 BUG at fs/jbd2/commit.c:534" from Postfix on ext4

On Jun 27, 2011, at 12:01, Ted Ts'o wrote:
> On Mon, Jun 27, 2011 at 05:30:11PM +0200, Lukas Czerner wrote:
>>> I've found some. So although data=journal users are minority, there are
>>> some. That being said I agree with you we should do something about it
>>> - either state that we want to fully support data=journal - and then we
>>> should really do better with testing it or deprecate it and remove it
>>> (which would save us some complications in the code).
>>> 
>>> I would be slightly in favor of removing it (code simplicity, less options
>>> to configure for admin, less options to test for us, some users I've come
>>> across actually were not quite sure why they are using it - they just
>>> thought it looks safer).
> 
> Hmm...  FYI, I hope to be able to bring on line automated testing for
> ext4 later this summer (there's a testing person at Google is has
> signed up to work on setting this up as his 20% project).  The test
> matrix that I have him included data=journal, so we will be getting
> better testing in the near future.
> 
> At least historically, data=journalling was the *simpler* case, and
> was the first thing supported by ext4.  (data=ordered required revoke
> handling which didn't land for six months or so).  So I'm not really
> that convinced that removing really buys us that much code
> simplification.
> 
> That being siad, it is true that data=journalled isn't necessarily
> faster.  For heavy disk-bound workloads, it can be slower.  So I can
> imagine adding some documentation that warns people not to use
> data=journal unless they really know what they are doing, but at least
> personally, I'm a bit reluctant to dispense with a bug report like
> this by saying, "oh, that feature should be deprecated".

I suppose I should chime in here, since I'm the one who (potentially
incorrectly) thinks I should be using data=journalled mode.

My basic impression is that the use of "data=journalled" can help
reduce the risk (slightly) of serious corruption to some kinds of
databases when the application does not provide appropriate syncs
or journalling on its own (IE: such as text-based Wiki database files).

Please correct me if this is horribly horribly wrong:

no journal:
  Nothing is journalled
  + Very fast.
  + Works well for filesystems that are "mkfs"ed on every boot
  - Have to fsck after every reboot

data=writeback:
  Metadata is journalled, data (to allocated extents) may be written
  before or after the metadata is updated with a new file size.
  + Fast (not as fast as unjournalled)
  + No need to "fsck" after a hard power-down
  - A crash or power failure in the middle of a write could leave
    old data on disk at the end of a file.  If security labeling
    such as SELinux is enabled, this could "contaminate" a file with
    data from a deleted file that was at a higher sensitivity.
    Log files (including binary database replication logs) may be
    effectively corrupted as a result.

data=ordered:
  Data appended to a file will be written before the metadata
  extending the length of the file is written, and in certain cases
  the data will be written before file renames (partial ordering),
  but the data itself is unjournalled, and may be only partially
  complete for updates.
  + Does not write data to the media twice
  + A crash or power failure will not leave old uninitialized data
    in files.
  - Data writes to files may only partially complete in the event
    of a crash.  No problems for logfiles, or self-journalled
    application databases, but others may experience partial writes
    in the event of a crash and need recovery.

data=journalled:
  Data and metadata are both journalled, meaning that a given data
  write will either complete or it will never occur, although the
  precise ordering is not guaranteed.  This also implies all of the
  data<=>metadata guarantees of data=ordered.
  + Direct IO data writes are effectively "atomic", resulting in
    less likelihood of data loss for application databases which do
    not do their own journalling.  This means that a power failure
    or system crash will not result in a partially-complete write.
  - Cached writes are not atomic
  + For small cached file writes (of only a few filesystem pages)
    there is a good chance that kernel writeback will queue the
    entire write as a single I/O and it will be "protected" as a
    result.  This helps reduce the chance of serious damage to some
    text-based database files (such as those for some Wikis), but
    is obviously not a guarantee.
  - This writes all data to the block device twice (once to the FS
    journal and once to the data blocks).  This may be especially bad
    for write-limited Flash-backed devices.

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html