linux-ext4 - Re: Dirty ext4 blocks system startup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140407124820.GB8468@thunk.org>
Date:	Mon, 7 Apr 2014 08:48:20 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Markus <M4rkusXXL@....de>
Cc:	"Darrick J. Wong" <darrick.wong@...cle.com>,
	linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: Dirty ext4 blocks system startup

On Mon, Apr 07, 2014 at 12:58:40PM +0200, Markus wrote:
> 
> Finally e2image finished successfully. But the produced file is way too big for a mail.
> 
> Any other possibility?
> (e2image does dump everything except file data and free space. But the problem seems to be just in the bitmap and/or journal.)
> 
> Actually, when I look at the code around e2fsck/recovery.c:594
> The error is detected and continue is called.
> But tagp/tag is never changed, but the checksum is always compared to the one from tag. Intended?

What mount options are you using?  It appears that you have journal
checksums enabled, which isn't on by default, and unfortunately,
there's a good reason for that.  The original code assumed that the
most common case for journal corruption would be caused by an
incomplete journal transaction getting written out if one were using
journal_async_commit.  This feature has not been enabled by default
because the qeustion of what to do when the journal gets corrupted in
other cases is not an easy one.

If some part of a transaction which is not the very last transaction
in the journal gets corrupted, replaying it could do severe damage to
the file system.  Unfortunately, simply deleting the journal and then
recreating it could also do more damage as well.  Most of the time, a
bad checksum happens because the last transaction hasn't fully made it
out to disk (especially if you use the journal_async_commit option,
which is a bit of a misnomer and has its own caveats[1]).  But if the
checksum violation happens in a journal transaction that is not the
last transaction in the journal, right now the recovery code aborts,
because we don't have good automated logic to handle this case.

I suspect if you need to get your file system back on its feet, the
best thing to do is to create a patched e2fsck that doesn't abort when
it finds a checksum error, but instead continues.  Then run it to
replay the journal, and then force a full file system check and hope
for the best.

What has been on my todo list to implement, but has been relatively
low priority because this is not a feature that we've documented or
encouraged peple to use, is to have e2fsck skip the transaction has a
bad checksum (i.e., not replay it at all), and then force a full file
system check.  This is a bit safer, but if you make e2fsck ignore the
checksum, it's no worse than if journal checksums weren't enabled in
the first place.

The long term thing that we need to add before we can really support
journal checksums is to checksum each individual data block, instead
of just each transaction.  Then when we have a bad checksum, we can
skip just the one bad data block, and then force a full fsck.

I'm sorry you ran into this.  What I should do is to disable these
mount options for now, since users who stumble across them, as
apparently you have, might be tempted to use them, and then get into
trouble.

     	      	      	   	      	   	 - Ted

[1] The issue with journal_async_commit is that it's possible (fairly
unlikely, but still possible) that the guarantees of data=ordered will
be violated.  If the data blocks that were written out while we are
resolving a delayed allocation writeback haven't made it all the way
down to the platter, it's possible for all of the journal writes and
the commit block to be reordered ahead of the data blocks.  In that
case, the checksum for the commit block would be valid, but some of
the data blocks might not have been written back to disk.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html