[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_CF3DC37BEB2026CB2F68408A2B62314E0C08@qq.com>
Date: Fri, 12 Jul 2024 00:16:34 +0800
From: Wang Jianjian <wangjianjian0@...mail.com>
To: Luis Henriques <luis.henriques@...ux.dev>,
"wangjianjian (C)" <wangjianjian3@...wei.com>
Cc: Theodore Ts'o <tytso@....edu>, Andreas Dilger <adilger@...ger.ca>,
Jan Kara <jack@...e.cz>, Harshad Shirwadkar <harshadshirwadkar@...il.com>,
linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] ext4: fix fast commit inode enqueueing during a full
journal commit
On 2024/7/11 23:16, Luis Henriques wrote:
> On Thu, Jul 11 2024, wangjianjian (C) wrote:
>
>> On 2024/7/11 16:35, Luis Henriques (SUSE) wrote:
>>> When a full journal commit is on-going, any fast commit has to be enqueued
>>> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing
>>> is done only once, i.e. if an inode is already queued in a previous fast
>>> commit entry it won't be enqueued again. However, if a full commit starts
>>> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
>>> be done into FC_Q_STAGING. And this is not being done in function
>>> ext4_fc_track_template().
>>> This patch fixes the issue by re-enqueuing an inode into the STAGING queue
>>> during the fast commit clean-up callback if it has a tid (i_sync_tid)
>>> greater than the one being handled. The STAGING queue will then be spliced
>>> back into MAIN.
>>> This bug was found using fstest generic/047. This test creates several 32k
>>> bytes files, sync'ing each of them after it's creation, and then shutting
>>> down the filesystem. Some data may be loss in this operation; for example a
>>> file may have it's size truncated to zero.
>>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@...ux.dev>
>>> ---
>>> Hi!
>>> v4 of this patch enqueues the inode into STAGING *only* if the current tid
>>> is non-zero. It will be zero when doing an fc commit, and this would mean
>>> to always re-enqueue the inode. This fixes the regressions caught by Ted
>>> in v3 with fstests generic/472 generic/496 generic/643.
>>> Also, since 2nd patch of v3 has already been merged, I've rebased this patch
>>> to be applied on top of it.
>>> fs/ext4/fast_commit.c | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
>>> index 3926a05eceee..facbc8dbbaa2 100644
>>> --- a/fs/ext4/fast_commit.c
>>> +++ b/fs/ext4/fast_commit.c
>>> @@ -1290,6 +1290,16 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>>> EXT4_STATE_FC_COMMITTING);
>>> if (tid_geq(tid, iter->i_sync_tid))
>>> ext4_fc_reset_inode(&iter->vfs_inode);
>>> + } else if (tid) {
>>> + /*
>>> + * If the tid is valid (i.e. non-zero) re-enqueue the
>> one quick question about tid, if one disk is using long time and its tid get
>> wrapped to 0, is it a valid seq? I don't find code handling this situation.
> Hmm... OK. So, to answer to your question, the 'tid' is expected to wrap.
> That's why we use:
>
> if (tid_geq(tid, iter->i_sync_tid))
Yes, I know this.
>
> instead of:
>
> if (tid >= iter->i_sync_tid)
>
> (The second patch in v3 actually fixed a few places where the tid_*()
> helpers weren't being used.)
>
> But your question shows me that my patch is wrong as '0' may actually be a
> valid 'tid' value.
Actually my question is, there are some place use '0' to check if a
transaction is valid, e.g.
In ext4_wait_for_tail_page_commit()
5218 while (1) {
5219 struct folio *folio =
filemap_lock_folio(inode->i_mapping,
5220 inode->i_size >> PAGE_SHIFT);
5221 if (IS_ERR(folio))
5222 return;
5223 ret = __ext4_journalled_invalidate_folio(folio, offset,
5224 folio_size(folio) - offset);
5225 folio_unlock(folio);
5226 folio_put(folio);
5227 if (ret != -EBUSY)
5228 return;
5229 commit_tid = 0;
5230 read_lock(&journal->j_state_lock);
5231 if (journal->j_committing_transaction)
5232 commit_tid =
journal->j_committing_transaction->t_tid;
5233 read_unlock(&journal->j_state_lock);
5234 if (commit_tid)
5235 jbd2_log_wait_commit(journal, commit_tid);
5236 }
5237 We only wait commit if tid is not zero.
And in __jbd2_log_wait_for_space()
79 if (space_left < nblocks) {
80 int chkpt =
journal->j_checkpoint_transactions != NULL;
81 tid_t tid = 0;
82
83 if (journal->j_committing_transaction)
84 tid =
journal->j_committing_transaction->t_tid;
85 spin_unlock(&journal->j_list_lock);
86 write_unlock(&journal->j_state_lock);
87 if (chkpt) {
88 jbd2_log_do_checkpoint(journal);
89 } else if
(jbd2_cleanup_journal_tail(journal) == 0) {
90 /* We were able to recover space;
yay! */
91 ;
92 } else if (tid) {
93 /*
94 *
jbd2_journal_commit_transaction() may want
95 * to take the checkpoint_mutex if
JBD2_FLUSHED
96 * is set. So we need to
temporarily drop it.
97 */
98 mutex_unlock(&journal->j_checkpoint_mutex);
99 jbd2_log_wait_commit(journal, tid);
100 write_lock(&journal->j_state_lock);
101 continue;
We also only wait commit if tid is not zero.
Does it mean all these have bugs if '0' is a valid 'tid' ?
But on the other hand, if we don't consider sync and fsync, and default
commit interval is 5s,
time of tid wrap to 0 is nearly 680 years. However, we can run
sync/fsync to make tid to increase
more quickly in real world ?
> Cheers,
Powered by blists - more mailing lists