[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dcb72c9d-001a-e416-b4cb-c78baedcb236@huaweicloud.com>
Date: Thu, 4 May 2023 19:35:29 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, tytso@....edu,
adilger.kernel@...ger.ca, yi.zhang@...wei.com, yukuai3@...wei.com,
chengzhihao1@...wei.com
Subject: Re: [PATCH] jbd2: recheck chechpointing non-dirty buffer
On 2023/5/3 23:50, Jan Kara wrote:
> On Wed 26-04-23 21:10:41, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> There is a long-standing metadata corruption issue that happens from
>> time to time, but it's very difficult to reproduce and analyse, benefit
>> from the JBD2_CYCLE_RECORD option, we found out that the problem is the
>> checkpointing process miss to write out some buffers which are raced by
>> another do_get_write_access(). Looks below for detail.
>>
>> jbd2_log_do_checkpoint() //transaction X
>> //buffer A is dirty and not belones to any transaction
>> __buffer_relink_io() //move it to the IO list
>> __flush_batch()
>> write_dirty_buffer()
>> do_get_write_access()
>> clear_buffer_dirty
>> __jbd2_journal_file_buffer()
>> //add buffer A to a new transaction Y
>> lock_buffer(bh)
>> //doesn't write out
>> __jbd2_journal_remove_checkpoint()
>> //finish checkpoint except buffer A
>> //filesystem corrupt if the new transaction Y isn't fully write out.
>>
>> The fix is subtle because we can't trust the chechpointing buffers and
>> transactions once we release the j_list_lock, they could be written back
>> and checkpointed by some others, or they could have been added to a new
>> transaction. So we have to re-add them on the checkpoint list and
>> recheck their status if they are clean and don't need to write out.
>>
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>> Tested-by: Zhihao Cheng <chengzhihao1@...wei.com>
>
> Thanks for the analysis. This indeed looks like a nasty issue to debug. I
> think we can actually solve the problem by simplifying the checkpointing
> code in jbd2_log_do_checkpoint(), not by making it more complex. What I
> think we can do is that we can completely remove the t_checkpoint_io_list
> and only keep buffers on t_checkpoint_list. When processing
> t_checkpoint_list in jbd2_log_do_checkpoint(), we just need to make sure to
> move t_checkpoint_list pointer to the next buffer when adding buffer to
> j_chkpt_bhs array. That way buffers to submit / already submitted buffers
> will be accumulating at the tail of the list. The logic in the loop already
> handles waiting for buffers under IO / removing cleaned buffers so this
> makes sure the list will eventually get empty. Buffers cannot get redirtied
> without being removed from the checkpoint list and moved to a newer
> transaction's checkpoint list so forward progress is guaranteed. The only
> other tweak we need to add is to check for the situation when all the
> buffers are in the j_chkpt_bhs array. So the end of the loop should look
> like:
>
> transaction->t_checkpoint_list = jh->j_cpnext;
> if (batch_count == JBD2_NR_BATCH || need_resched() ||
> spin_needbreak(&journal->j_list_lock) ||
> transaction->t_checkpoint_list == journal->j_chkpt_bhs[0])
> flush and restart
>
> and that should be it. What do you think?
>
This solution sounds great, Let me do it.
Thanks,
Yi.
Powered by blists - more mailing lists