linux-ext4 - Re: Dirty ext4 blocks system startup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2164274.jmlex94sWc@web.de>
Date:	Mon, 07 Apr 2014 16:06:50 +0200
From:	Markus <M4rkusXXL@....de>
To:	Theodore Ts'o <tytso@....edu>
Cc:	"Darrick J. Wong" <darrick.wong@...cle.com>,
	linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: Dirty ext4 blocks system startup

Theodore Ts'o wrote on 07.04.2014:
> On Mon, Apr 07, 2014 at 12:58:40PM +0200, Markus wrote:
> > 
> > Finally e2image finished successfully. But the produced file is way too 
big for a mail.
> > 
> > Any other possibility?
> > (e2image does dump everything except file data and free space. But the 
problem seems to be just in the bitmap and/or journal.)
> > 
> > Actually, when I look at the code around e2fsck/recovery.c:594
> > The error is detected and continue is called.
> > But tagp/tag is never changed, but the checksum is always compared to the 
one from tag. Intended?
> 
> What mount options are you using?  It appears that you have journal
> checksums enabled, which isn't on by default, and unfortunately,
> there's a good reason for that.  The original code assumed that the
> most common case for journal corruption would be caused by an
> incomplete journal transaction getting written out if one were using
> journal_async_commit.  This feature has not been enabled by default
> because the qeustion of what to do when the journal gets corrupted in
> other cases is not an easy one.

Normally just "noatime,journal_checksum", but with the corrupted journal I use 
"ro,noload".

The "man mount" reads well about that "journal_checksum" option ;)


> If some part of a transaction which is not the very last transaction
> in the journal gets corrupted, replaying it could do severe damage to
> the file system.  Unfortunately, simply deleting the journal and then
> recreating it could also do more damage as well.  Most of the time, a
> bad checksum happens because the last transaction hasn't fully made it
> out to disk (especially if you use the journal_async_commit option,
> which is a bit of a misnomer and has its own caveats[1]).  But if the
> checksum violation happens in a journal transaction that is not the
> last transaction in the journal, right now the recovery code aborts,
> because we don't have good automated logic to handle this case.

The recovery does not seem to abort. It calles continue and is caught in an 
endless loop.


> I suspect if you need to get your file system back on its feet, the
> best thing to do is to create a patched e2fsck that doesn't abort when
> it finds a checksum error, but instead continues.  Then run it to
> replay the journal, and then force a full file system check and hope
> for the best.

The code calls "continue". ;)
So I just remove the whole if clause:
  /* Look for block corruption */
  if (!jbd2_block_tag_csum_verify(
  		journal, tag, obh->b_data,
  		be32_to_cpu(tmp->h_sequence))) {
- 	brelse(obh);
- 	success = -EIO;
  	printk(KERN_ERR "JBD: Invalid "
  			"checksum recovering "
  			"block %lld in log\n",
  			blocknr);
- 	continue;
  }

It would then ignore the checksum and just issue a message. Right?


> What has been on my todo list to implement, but has been relatively
> low priority because this is not a feature that we've documented or
> encouraged peple to use, is to have e2fsck skip the transaction has a
> bad checksum (i.e., not replay it at all), and then force a full file
> system check.  This is a bit safer, but if you make e2fsck ignore the
> checksum, it's no worse than if journal checksums weren't enabled in
> the first place.
> 
> The long term thing that we need to add before we can really support
> journal checksums is to checksum each individual data block, instead
> of just each transaction.  Then when we have a bad checksum, we can
> skip just the one bad data block, and then force a full fsck.
> 
> I'm sorry you ran into this.  What I should do is to disable these
> mount options for now, since users who stumble across them, as
> apparently you have, might be tempted to use them, and then get into
> trouble.
> 
>      	      	      	   	      	   	 - Ted
> 
> [1] The issue with journal_async_commit is that it's possible (fairly
> unlikely, but still possible) that the guarantees of data=ordered will
> be violated.  If the data blocks that were written out while we are
> resolving a delayed allocation writeback haven't made it all the way
> down to the platter, it's possible for all of the journal writes and
> the commit block to be reordered ahead of the data blocks.  In that
> case, the checksum for the commit block would be valid, but some of
> the data blocks might not have been written back to disk.

Thanks so far,
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html