linux-kernel - Re: [PATCH v2] ext4: fix fast commit inode enqueueing during a full journal commit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87msob45o7.fsf@brahms.olymp>
Date: Mon, 27 May 2024 16:48:24 +0100
From: Luis Henriques <luis.henriques@...ux.dev>
To: Jan Kara <jack@...e.cz>
Cc: Theodore Ts'o <tytso@....edu>,  Andreas Dilger <adilger@...ger.ca>,
  Harshad Shirwadkar <harshadshirwadkar@...il.com>,
  linux-ext4@...r.kernel.org,  linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] ext4: fix fast commit inode enqueueing during a full
 journal commit

On Mon 27 May 2024 09:29:40 AM +01, Luis Henriques wrote;

<snip>

>>> +	/*
>>> +	 * Used to flag an inode as part of the next fast commit; will be
>>> +	 * reset during fast commit clean-up
>>> +	 */
>>> +	tid_t i_fc_next;
>>> +
>>
>> Do we really need new tid in the inode? I'd be kind of hoping we could use
>> EXT4_I(inode)->i_sync_tid for this - I can see we even already set it in
>> ext4_fc_track_template() and used for similar comparisons in fast commit
>> code.
>
> Ah, true.  It looks like it could be used indeed.  We'll still need a flag
> here, but a simple bool should be enough for that.

After looking again at the code, I'm not 100% sure that this is actually
doable.  For example, if I replace the above by

	bool i_fc_next;

and set to to 'true' below:

>>> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
>>> index 87c009e0c59a..bfdf249f0783 100644
>>> --- a/fs/ext4/fast_commit.c
>>> +++ b/fs/ext4/fast_commit.c
>>> @@ -402,6 +402,8 @@ static int ext4_fc_track_template(
>>>  				 sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
>>>  				&sbi->s_fc_q[FC_Q_STAGING] :
>>>  				&sbi->s_fc_q[FC_Q_MAIN]);
>>> +	else
>>> +		ei->i_fc_next = tid;

		ei->i_fc_next = true;

Then, when we get to the ext4_fc_cleanup(), the value of iter->i_sync_tid
may have changed in the meantime from, e.g., ext4_do_update_inode() or
__ext4_iget().  This would cause the clean-up code to be bogus if it still
implements a the logic below, by comparing the tid with i_sync_tid.
(Although, to be honest, I couldn't see any visible effect in the quick
testing I've done.)  Or am I missing something, and this is *exactly* the
behaviour you'd expect?

Cheers,
-- 
Luis

>>>  	spin_unlock(&sbi->s_fc_lock);
>>>  
>>>  	return ret;
>>> @@ -1280,6 +1282,15 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>>>  	list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
>>>  				 i_fc_list) {
>>>  		list_del_init(&iter->i_fc_list);
>>> +		if (iter->i_fc_next == tid)
>>> +			iter->i_fc_next = 0;
>>> +		else if (iter->i_fc_next > tid)
>> 			 ^^^ careful here, TIDs do wrap so you need to use
>> tid_geq() for comparison.
>>
>
> Yikes!  Thanks, I'll update the code to do that.
>
>>> +			/*
>>> +			 * re-enqueue inode into STAGING, which will later be
>>> +			 * splice back into MAIN
>>> +			 */
>>> +			list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
>>> +				      &sbi->s_fc_q[FC_Q_STAGING]);
>>>  		ext4_clear_inode_state(&iter->vfs_inode,
>>>  				       EXT4_STATE_FC_COMMITTING);
>>>  		if (iter->i_sync_tid <= tid)
>> 				     ^^^ and I can see this is buggy as
>> well and needs tid_geq() (not your fault obviously).
>
> Yeah, good point.  I can that too in v3.
>
> Again, thanks a lot for your review!
>
> Cheers,
> -- 
> Luís