linux-ext4 - Re: 答复: 答复: [PATCH] jbd2: avoid transaction reuse after reformatting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200918130252.GG18920@quack2.suse.cz>
Date:   Fri, 18 Sep 2020 15:02:52 +0200
From:   Jan Kara <jack@...e.cz>
To:     常凤楠 <changfengnan@...vision.com>
Cc:     Jan Kara <jack@...e.cz>, changfengnan <changfengnan@...com>,
        "adilger@...ger.ca" <adilger@...ger.ca>,
        "darrick.wong@...cle.com" <darrick.wong@...cle.com>,
        "jack@...e.com" <jack@...e.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "tytso@....edu" <tytso@....edu>
Subject: Re: 答复: 答复: [PATCH] jbd2:
 avoid transaction reuse after reformatting

Hello,

On Fri 18-09-20 01:49:09, 常凤楠 wrote:
> Sorry about my mailer, the patch is in the attachment.

Thanks for the patch. Functionally the patch looks mostly OK now. The only
concern I have is that it handles checksum failures only in
JBD2_DESCRIPTOR_BLOCK. This is the most likely case but it could also
happen that JBD2_REVOKE_BLOCK or JBD2_COMMIT_BLOCK is the first one you see
with mismatching checksum. So I think you need to handle these cases as
well. I think your ri_commit_block logic below is an attempt to deal with
these cases (but it's difficult to be sure because of complete lack of
comments) but it is not reliable. A valid transaction can begin both with a
descriptor or with a revoke block.

A few other comments mostly about coding style below:

diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index a4967b27ffb6..f7702e14077f 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -417,7 +417,7 @@ static int do_one_pass(journal_t *journal,
 			struct recovery_info *info, enum passtype pass)
 {
 	unsigned int		first_commit_ID, next_commit_ID;
-	unsigned long		next_log_block;
+	unsigned long		next_log_block, ri_commit_block = 0;
 	int			err, success = 0;
 	journal_superblock_t *	sb;
 	journal_header_t *	tmp;
@@ -428,7 +428,9 @@ static int do_one_pass(journal_t *journal,
 	__u32			crc32_sum = ~0; /* Transactional Checksums */
 	int			descr_csum_size = 0;
 	int			block_error = 0;
-
+	bool		need_check_commit_time = false;
+	__be64		last_trans_commit_time;

All variable names in this function seem to be indented by one more
column. Please keep the indentation.

+	

This empty line has whitespace on it. Please delete.

 	/*
 	 * First thing is to establish what we expect to find in the log
 	 * (in terms of transaction IDs), and where (in terms of log
@@ -514,18 +516,18 @@ static int do_one_pass(journal_t *journal,
 		switch(blocktype) {
 		case JBD2_DESCRIPTOR_BLOCK:
 			/* Verify checksum first */
+			if(pass == PASS_SCAN) 
			  ^ Coding style requires space before opening (.
You have this problem at multiple places.

+				ri_commit_block = 0;
+
 			if (jbd2_journal_has_csum_v2or3(journal))
 				descr_csum_size =
 					sizeof(struct jbd2_journal_block_tail);
 			if (descr_csum_size > 0 &&
 			    !jbd2_descriptor_block_csum_verify(journal,
 							       bh->b_data)) {
-				printk(KERN_ERR "JBD2: Invalid checksum "
-				       "recovering block %lu in log\n",
-				       next_log_block);
-				err = -EFSBADCRC;
-				brelse(bh);
-				goto failed;
+				need_check_commit_time = true;
+				jbd_debug(1, "invalid descriptor block found in %lu, continue recovery first.\n",next_log_block);
+				
 			}
 
 			/* If it is a valid descriptor block, replay it
@@ -535,6 +537,7 @@ static int do_one_pass(journal_t *journal,
 			if (pass != PASS_REPLAY) {
 				if (pass == PASS_SCAN &&
 				    jbd2_has_feature_checksum(journal) &&
+				    !need_check_commit_time &&
 				    !info->end_transaction) {
 					if (calc_chksums(journal, bh,
 							&next_log_block,
@@ -688,6 +691,36 @@ static int do_one_pass(journal_t *journal,
 			 * are present verify them in PASS_SCAN; else not
 			 * much to do other than move on to the next sequence
 			 * number. */
+			if(pass == PASS_SCAN) {
+				struct commit_header *cbh =
+					(struct commit_header *)bh->b_data;
+				if(need_check_commit_time) {
+					__be64 commit_time = be64_to_cpu(cbh->h_commit_sec);
+					if(commit_time >= last_trans_commit_time) {
+						printk(KERN_ERR "JBD2: Invalid checksum found in log, %d\n",
+						next_commit_ID);
+						err = -EFSBADCRC;
+						brelse(bh);
+						goto failed;
+					}
+					else
+					{
  Coding style requires to put opening { on the same line as 'else'. Like:
					else {
+						/*it's not belong to same journal, just end this recovery with success*/
+						jbd_debug(1, "JBD2: Invalid checksum found in block in log, but not same journal %d\n",
+						next_commit_ID);
+						err = 0;
+						brelse(bh);
+						goto done;
+					}
+				}
+				if(ri_commit_block) {
+					jbd_debug(1, "invalid commit block found in %lu, stop here.\n",next_log_block);
+					brelse(bh);
+					goto done;
+				}
+				ri_commit_block = next_log_block;

Why does the ri_commit_block logic exist? I don't see it bringing any
benefit...

+				last_trans_commit_time = be64_to_cpu(cbh->h_commit_sec);
+			}
 			if (pass == PASS_SCAN &&
 			    jbd2_has_feature_checksum(journal)) {
 				int chksum_err, chksum_seen;
@@ -755,6 +788,12 @@ static int do_one_pass(journal_t *journal,
 			continue;
 
 		case JBD2_REVOKE_BLOCK:
+			if (pass == PASS_SCAN && 
+				ri_commit_block) {
+				jbd_debug(1, "invalid revoke block found in %lu, stop here.\n",next_log_block);
+				brelse(bh);
+				goto done;
+			}

This is wrong. A valid transaction can start with a revoke block...

 			/* If we aren't in the REVOKE pass, then we can
 			 * just skip over this block. */
 			if (pass != PASS_REVOKE) {

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR