[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <937d4aa4-20bc-a1ab-1494-866e1f7a95a8@huawei.com>
Date: Fri, 11 Jan 2019 21:44:14 +0800
From: "zhangyi (F)" <yi.zhang@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: <linux-ext4@...r.kernel.org>, <tytso@....edu>,
<adilger.kernel@...ger.ca>, <miaoxie@...wei.com>
Subject: Re: [PATCH] jbd2: set freed flag while revoking a buffer which
belongs to older transaction
Thanks a lot for your in-depth explanation, I check the code and get it now.
Will modify the patch as you suggested and post v2 after test.
Thanks,
Yi.
On 2019/1/11 18:30, Jan Kara Wrote:
> On Fri 11-01-19 14:11:31, zhangyi (F) wrote:
>> On 2019/1/10 19:20, Jan Kara Wrote:
>>> On Thu 10-01-19 14:12:02, zhangyi (F) wrote:
>>>> Now, we capture a data corruption problem on ext4 while we're truncating
>>>> an extent index block. Imaging that if we are revoking a buffer which
>>>> has been journaled by the committing transaction, the buffer's jbddirty
>>>> flag will not be cleared in jbd2_journal_forget(), so the commit code
>>>> will set the buffer dirty flag again after refile the buffer.
>>>>
>>>> fsx kjournald2
>>>> jbd2_journal_commit_transaction
>>>> jbd2_journal_revoke commit phase 1~5...
>>>> jbd2_journal_forget
>>>> belongs to older transaction commit phase 6
>>>> jbddirty not clear __jbd2_journal_refile_buffer
>>>> __jbd2_journal_unfile_buffer
>>>> test_clear_buffer_jbddirty
>>>> mark_buffer_dirty
>>>>
>>>> Finally, if the freed extent index block was allocated again as data
>>>> block by some other files, it may corrupt the file data when writing
>>>> cached pages later, such as during umount time.
>>>>
>>>> This patch mark buffer as freed when it already belongs to the
>>>> committing transaction in jbd2_journal_forget(), so that commit code
>>>> knows it should clear dirty bits when it is done with the buffer.
>>>>
>>>> This problem can be reproduced by xfstests generic/455 easily with
>>>> seeds (3246 3247 3248 3249).
>>>>
>>>> Signed-off-by: zhangyi (F) <yi.zhang@...wei.com>
>>>> Cc: stable@...r.kernel.org
>>>
>>> Thanks a lot for the analysis and the patch! I fully agree with your
>>> analysis however I think just setting buffer as freed isn't completely
>>> correct. The problem is following: The metadata buffer X has been modified
>>> by the commiting transaction - let's call it A. It has been freed in the
>>> currently running transaction B. Now jbd2_journal_forget() clears
>>> b_next_transaction and if you set buffer freed flag, X will not be added to
>>> the checkpoint list. So when transaction A finishes commit, it can get
>>> checkpointed (without writing out X) before transaction B commits. So if a
>>> crash occurs before B commits, we'd loose modification of X from
>>> transaction A and thus cause filesystem corruption.
>>>
>> Thanks for your explanation! There are still two points I don't quite
>> understand.
>>
>> I check all three cases of doing checkpoint. IIUC, both jbd2_journal_destroy()
>> and jbd2_journal_flush() wait the current running transaction B to complete
>> before doing checkpoint besides __jbd2_log_wait_for_space(). So I guess this is
>> the case that you mentioned of transaction A could be checkpointed before B
>> commits, am I right?
>
> Yes, __jbd2_log_wait_for_space() can checkpoint already committed
> transactions (i.e., A in our case) without waiting for the running
> transaction (B in our case).
>
>> For another case, jbd2_update_log_tail() will be invoked after transaction B
>> complete, so the problem above also can't happen here, right?
>
> I'm not sure which "another case" you speak about here...
>
>>> What rather needs to happen is the same thing that is done in
>>> journal_unmap_buffer() in this case: We set buffer freed flag and we also
>>> set b_next_transaction to the currently running transaction (B). This will
>>> prevent A from being checkpointed before B commits and thus avoids the
>>> problem above.
>>>
>> Sorry, I don't get this point. I find that the difference between setting
>> b_next_transaction or not is just re-added the buffer X to the BJ_Reserved
>> list or not. How could we avoid the problem above.
>
> Currently, X will be removed from transaction B by jbd2_journal_revoke().
> So once A commits, it will not be in the running transaction and thus
> checkpoint of A can complete before B is committed.
>
> If we set X->b_next_transaction to B, X will be part of transaction B. The
> handling of buffer_freed() buffer in commit code thus will not clear
> jbddirty bit and X will get inserted in X as buffer for checkpointing. And
> thus checkpoint of A will not be able to complete before B commits, fixing
> the problem I have described.
>
>> BTW, I am thinking of a similar case. If we modify buffer X instead of
>> revork it in the transaction B, we also need to avoid transaction A from
>> being checkpointed before B commits, because current buffer X contains the
>> modified data (modified by B). So we should prevent writing it before
>> B commits, otherwise it will corrupt metadata. How do we handle this
>> situation now?
>
> Buffers that are part of the running transaction never have buffer_dirty
> bit set (look how jbd2_journal_file_buffer() clears this bit). Thus
> background writeback will not write these buffers. Also checkpointing code
> checks whether the buffer is part of running / committing transaction and
> handles these buffers specially exactly because they cannot be written out
> directly.
>
> Honza
>
Powered by blists - more mailing lists