linux-ext4 - Re: [PATCH v2] ext4: fix fast commit inode enqueueing during a full journal commit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240524162231.l5r4niz7awjgfju6@quack3>
Date: Fri, 24 May 2024 18:22:31 +0200
From: Jan Kara <jack@...e.cz>
To: "Luis Henriques (SUSE)" <luis.henriques@...ux.dev>
Cc: Theodore Ts'o <tytso@....edu>, Andreas Dilger <adilger@...ger.ca>,
	Jan Kara <jack@...e.cz>,
	Harshad Shirwadkar <harshadshirwadkar@...il.com>,
	linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] ext4: fix fast commit inode enqueueing during a full
 journal commit

On Thu 23-05-24 12:16:18, Luis Henriques (SUSE) wrote:
> When a full journal commit is on-going, any fast commit has to be enqueued
> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN.  This enqueueing
> is done only once, i.e. if an inode is already queued in a previous fast
> commit entry it won't be enqueued again.  However, if a full commit starts
> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
> be done into FC_Q_STAGING.  And this is not being done in function
> ext4_fc_track_template().
> 
> This patch fixes the issue by flagging an inode that is already enqueued in
> either queues.  Later, during the fast commit clean-up callback, if the
> inode has a tid that is bigger than the one being handled, that inode is
> re-enqueued into STAGING and the spliced back into MAIN.
> 
> This bug was found using fstest generic/047.  This test creates several 32k
> bytes files, sync'ing each of them after it's creation, and then shutting
> down the filesystem.  Some data may be loss in this operation; for example a
> file may have it's size truncated to zero.
> 
> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@...ux.dev>

Thanks for the fix. Some comments below:

> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 983dad8c07ec..4c308c18c3da 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1062,9 +1062,18 @@ struct ext4_inode_info {
>  	/* Fast commit wait queue for this inode */
>  	wait_queue_head_t i_fc_wait;
>  
> -	/* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */
> +	/*
> +	 * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len,
> +	 * i_fc_next
> +	 */
>  	struct mutex i_fc_lock;
>  
> +	/*
> +	 * Used to flag an inode as part of the next fast commit; will be
> +	 * reset during fast commit clean-up
> +	 */
> +	tid_t i_fc_next;
> +

Do we really need new tid in the inode? I'd be kind of hoping we could use
EXT4_I(inode)->i_sync_tid for this - I can see we even already set it in
ext4_fc_track_template() and used for similar comparisons in fast commit
code.

> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 87c009e0c59a..bfdf249f0783 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -402,6 +402,8 @@ static int ext4_fc_track_template(
>  				 sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
>  				&sbi->s_fc_q[FC_Q_STAGING] :
>  				&sbi->s_fc_q[FC_Q_MAIN]);
> +	else
> +		ei->i_fc_next = tid;
>  	spin_unlock(&sbi->s_fc_lock);
>  
>  	return ret;
> @@ -1280,6 +1282,15 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>  	list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
>  				 i_fc_list) {
>  		list_del_init(&iter->i_fc_list);
> +		if (iter->i_fc_next == tid)
> +			iter->i_fc_next = 0;
> +		else if (iter->i_fc_next > tid)
			 ^^^ careful here, TIDs do wrap so you need to use
tid_geq() for comparison.

> +			/*
> +			 * re-enqueue inode into STAGING, which will later be
> +			 * splice back into MAIN
> +			 */
> +			list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
> +				      &sbi->s_fc_q[FC_Q_STAGING]);
>  		ext4_clear_inode_state(&iter->vfs_inode,
>  				       EXT4_STATE_FC_COMMITTING);
>  		if (iter->i_sync_tid <= tid)
				     ^^^ and I can see this is buggy as
well and needs tid_geq() (not your fault obviously).

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR