linux-ext4 - Re: [PATCH v3 1/2] jbd2: continue to record log between each mount

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230315094826.okdarxaapjyqmlhq@quack3>
Date:   Wed, 15 Mar 2023 10:48:26 +0100
From:   Jan Kara <jack@...e.cz>
To:     Zhang Yi <yi.zhang@...weicloud.com>
Cc:     linux-ext4@...r.kernel.org, tytso@....edu,
        adilger.kernel@...ger.ca, jack@...e.cz, yi.zhang@...wei.com,
        yukuai3@...wei.com
Subject: Re: [PATCH v3 1/2] jbd2: continue to record log between each mount

On Tue 14-03-23 22:05:21, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
> 
> For a newly mounted file system, the journal committing thread always
> record new transactions from the start of the journal area, no matter
> whether the journal was clean or just has been recovered. So the logdump
> code in debugfs cannot dump continuous logs between each mount, it is
> disadvantageous to analysis corrupted file system image and locate the
> file system inconsistency bugs.
> 
> If we get a corrupted file system in the running products and want to
> find out what has happened, besides lookup the system log, one effective
> way is to backtrack the journal log. But we may not always run e2fsck
> before each mount and the default fsck -a mode also cannot always
> checkout all inconsistencies, so it could left over some inconsistencies
> into the next mount until we detect it. Finally, transactions in the
> journal may probably discontinuous and some relatively new transactions
> has been covered, it becomes hard to analyse. If we could record
> transactions continuously between each mount, we could acquire more
> useful info from the journal. Like this:
> 
>  |Previous mount checkpointed/recovered logs|Current mount logs         |
>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
> 
> And yes the journal area is limited and cannot record everything, the
> problematic transaction may also be covered even if we do this, but
> this is still useful for fuzzy tests and short-running products.
> 
> This patch save the head blocknr in the superblock after flushing the
> journal or unmounting the file system, let the next mount could continue
> to record new transaction behind it. This change is backward compatible
> because the old kernel does not care about the head blocknr of the
> journal. It is also fine if we mount a clean old image without valid
> head blocknr, we fail back to set it to s_first just like before.
> Finally, for the case of mount an unclean file system, we could also get
> the journal head easily after scanning/replaying the journal, it will
> continue to record new transaction after the recovered transactions.
> 
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>

I like this implementation! I even think we could perhaps make ext4 always
behave this way to not increase size of the test matrix. Or do you see any
downside to this option?

								Honza

> ---
>  fs/jbd2/journal.c    | 18 ++++++++++++++++--
>  fs/jbd2/recovery.c   | 22 +++++++++++++++++-----
>  include/linux/jbd2.h |  9 +++++++--
>  3 files changed, 40 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index e80c781731f8..c57ab466fc18 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -1556,8 +1556,21 @@ static int journal_reset(journal_t *journal)
>  	journal->j_first = first;
>  	journal->j_last = last;
>  
> -	journal->j_head = journal->j_first;
> -	journal->j_tail = journal->j_first;
> +	if (journal->j_head != 0 && journal->j_flags & JBD2_CYCLE_RECORD) {
> +		/*
> +		 * Disable the cycled recording mode if the journal head block
> +		 * number is not correct.
> +		 */
> +		if (journal->j_head < first || journal->j_head >= last) {
> +			printk(KERN_WARNING "JBD2: Incorrect Journal head block %lu, "
> +			       "disable journal_cycle_record\n",
> +			       journal->j_head);
> +			journal->j_head = journal->j_first;
> +		}
> +	} else {
> +		journal->j_head = journal->j_first;
> +	}
> +	journal->j_tail = journal->j_head;
>  	journal->j_free = journal->j_last - journal->j_first;
>  
>  	journal->j_tail_sequence = journal->j_transaction_sequence;
> @@ -1729,6 +1742,7 @@ static void jbd2_mark_journal_empty(journal_t *journal, blk_opf_t write_flags)
>  
>  	sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
>  	sb->s_start    = cpu_to_be32(0);
> +	sb->s_head     = cpu_to_be32(journal->j_head);
>  	if (jbd2_has_feature_fast_commit(journal)) {
>  		/*
>  		 * When journal is clean, no need to commit fast commit flag and
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index 8286a9ec122f..0184931d47f7 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -29,6 +29,7 @@ struct recovery_info
>  {
>  	tid_t		start_transaction;
>  	tid_t		end_transaction;
> +	unsigned long	head_block;
>  
>  	int		nr_replays;
>  	int		nr_revokes;
> @@ -301,11 +302,11 @@ int jbd2_journal_recover(journal_t *journal)
>  	 * is always zero if, and only if, the journal was cleanly
>  	 * unmounted.
>  	 */
> -
>  	if (!sb->s_start) {
> -		jbd2_debug(1, "No recovery required, last transaction %d\n",
> -			  be32_to_cpu(sb->s_sequence));
> +		jbd2_debug(1, "No recovery required, last transaction %d, head block %u\n",
> +			  be32_to_cpu(sb->s_sequence), be32_to_cpu(sb->s_head));
>  		journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
> +		journal->j_head = be32_to_cpu(sb->s_head);
>  		return 0;
>  	}
>  
> @@ -324,6 +325,9 @@ int jbd2_journal_recover(journal_t *journal)
>  	/* Restart the log at the next transaction ID, thus invalidating
>  	 * any existing commit records in the log. */
>  	journal->j_transaction_sequence = ++info.end_transaction;
> +	journal->j_head = info.head_block;
> +	jbd2_debug(1, "JBD2: last transaction %d, head block %lu\n",
> +		  journal->j_transaction_sequence, journal->j_head);
>  
>  	jbd2_journal_clear_revoke(journal);
>  	err2 = sync_blockdev(journal->j_fs_dev);
> @@ -364,6 +368,7 @@ int jbd2_journal_skip_recovery(journal_t *journal)
>  	if (err) {
>  		printk(KERN_ERR "JBD2: error %d scanning journal\n", err);
>  		++journal->j_transaction_sequence;
> +		journal->j_head = journal->j_first;
>  	} else {
>  #ifdef CONFIG_JBD2_DEBUG
>  		int dropped = info.end_transaction - 
> @@ -373,6 +378,7 @@ int jbd2_journal_skip_recovery(journal_t *journal)
>  			  dropped, (dropped == 1) ? "" : "s");
>  #endif
>  		journal->j_transaction_sequence = ++info.end_transaction;
> +		journal->j_head = info.head_block;
>  	}
>  
>  	journal->j_tail = 0;
> @@ -462,7 +468,7 @@ static int do_one_pass(journal_t *journal,
>  			struct recovery_info *info, enum passtype pass)
>  {
>  	unsigned int		first_commit_ID, next_commit_ID;
> -	unsigned long		next_log_block;
> +	unsigned long		next_log_block, head_block;
>  	int			err, success = 0;
>  	journal_superblock_t *	sb;
>  	journal_header_t *	tmp;
> @@ -485,6 +491,7 @@ static int do_one_pass(journal_t *journal,
>  	sb = journal->j_superblock;
>  	next_commit_ID = be32_to_cpu(sb->s_sequence);
>  	next_log_block = be32_to_cpu(sb->s_start);
> +	head_block = next_log_block;
>  
>  	first_commit_ID = next_commit_ID;
>  	if (pass == PASS_SCAN)
> @@ -809,6 +816,7 @@ static int do_one_pass(journal_t *journal,
>  				if (commit_time < last_trans_commit_time)
>  					goto ignore_crc_mismatch;
>  				info->end_transaction = next_commit_ID;
> +				info->head_block = head_block;
>  
>  				if (!jbd2_has_feature_async_commit(journal)) {
>  					journal->j_failed_commit =
> @@ -817,8 +825,10 @@ static int do_one_pass(journal_t *journal,
>  					break;
>  				}
>  			}
> -			if (pass == PASS_SCAN)
> +			if (pass == PASS_SCAN) {
>  				last_trans_commit_time = commit_time;
> +				head_block = next_log_block;
> +			}
>  			brelse(bh);
>  			next_commit_ID++;
>  			continue;
> @@ -868,6 +878,8 @@ static int do_one_pass(journal_t *journal,
>  	if (pass == PASS_SCAN) {
>  		if (!info->end_transaction)
>  			info->end_transaction = next_commit_ID;
> +		if (!info->head_block)
> +			info->head_block = head_block;
>  	} else {
>  		/* It's really bad news if different passes end up at
>  		 * different places (but possible due to IO errors). */
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 5962072a4b19..475f135260c9 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -265,8 +265,10 @@ typedef struct journal_superblock_s
>  	__u8	s_padding2[3];
>  /* 0x0054 */
>  	__be32	s_num_fc_blks;		/* Number of fast commit blocks */
> -/* 0x0058 */
> -	__u32	s_padding[41];
> +	__be32	s_head;			/* blocknr of head of log, only uptodate
> +					 * while the filesystem is clean */
> +/* 0x005C */
> +	__u32	s_padding[40];
>  	__be32	s_checksum;		/* crc32c(superblock) */
>  
>  /* 0x0100 */
> @@ -1392,6 +1394,9 @@ JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit,	FAST_COMMIT)
>  #define JBD2_ABORT_ON_SYNCDATA_ERR	0x040	/* Abort the journal on file
>  						 * data write error in ordered
>  						 * mode */
> +#define JBD2_CYCLE_RECORD		0x080	/* Journal cycled record log on
> +						 * clean and empty filesystem
> +						 * logging area */
>  #define JBD2_FAST_COMMIT_ONGOING	0x100	/* Fast commit is ongoing */
>  #define JBD2_FULL_COMMIT_ONGOING	0x200	/* Full commit is ongoing */
>  #define JBD2_JOURNAL_FLUSH_DISCARD	0x0001
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR