[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee706b3e-2a87-fa1a-570b-64870d5e49ad@huaweicloud.com>
Date: Thu, 1 Jun 2023 22:20:38 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Zhihao Cheng <chengzhihao1@...wei.com>, Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, tytso@....edu,
adilger.kernel@...ger.ca, yi.zhang@...wei.com, yukuai3@...wei.com
Subject: Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing
while doing checkpoint
On 2023/6/1 21:44, Zhihao Cheng wrote:
> 在 2023/6/1 17:41, Jan Kara 写道:
>
> Hi, Jan
>> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
>>> From: Zhihao Cheng <chengzhihao1@...wei.com>
>>>
>>> Following process,
>>>
>>> jbd2_journal_commit_transaction
>>> // there are several dirty buffer heads in transaction->t_checkpoint_list
>>> P1 wb_workfn
>>> jbd2_log_do_checkpoint
>>> if (buffer_locked(bh)) // false
>>> __block_write_full_page
>>> trylock_buffer(bh)
>>> test_clear_buffer_dirty(bh)
>>> if (!buffer_dirty(bh))
>>> __jbd2_journal_remove_checkpoint(jh)
>>> if (buffer_write_io_error(bh)) // false
>>> >> bh IO error occurs <<
>>> jbd2_cleanup_journal_tail
>>> __jbd2_update_log_tail
>>> jbd2_write_superblock
>>> // The bh won't be replayed in next mount.
>>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
>>>
>>> Since writeback process clears buffer dirty after locking buffer head,
>>> we can fix it by checking buffer dirty firstly and then checking buffer
>>> locked, the buffer head can be removed if it is neither dirty nor locked.
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
>>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
>>> Signed-off-by: Zhihao Cheng <chengzhihao1@...wei.com>
>>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>>
>> OK, the analysis is correct but I'm afraid the fix won't be that easy. The
>> reordering of tests you did below doesn't really help because CPU or the
>> compiler are free to order the loads (and stores) in whatever way they
>> wish. You'd have to use memory barriers when reading and modifying bh flags
>> (although the modification side is implicitely handled by the bitlock
>> code) to make this work reliably. But that is IMHO too subtle for this
>> code.
>>
>
I write two litmus-test files in tools/memory-model to check the memory module
of these two cases as jbd2_log_do_checkpoint() and __cp_buffer_busy() does.
**1. test-ww-rr.litmus** //simulate our changes in jbd2_log_do_checkpoint()
C WW-RR
(*
* Result: Never
*
* This test shows that write-write ordering
* (in P0()) is visible to external process P2().
*
* bit 0: lock
* bit 1: dirty
*)
{
x=2;
}
P0(int *x)
{
int r1;
r1 = READ_ONCE(*x);
r1 = r1 | 1; //lock
WRITE_ONCE(*x, r1);
r1 = READ_ONCE(*x);
r1 = r1 & 253; //&~2, clear dirty
WRITE_ONCE(*x, r1);
}
P1(int *x)
{
int r0;
int r1;
r0 = READ_ONCE(*x);
r1 = READ_ONCE(*x);
}
exists (1:r1=2 /\ 1:r0=1)
The test results are:
$ herd7 -conf linux-kernel.cfg litmus-tests/test-ww-rr.litmus
Test WW-RR Allowed
States 6
1:r0=1; 1:r1=1;
1:r0=2; 1:r1=1;
1:r0=2; 1:r1=2;
1:r0=2; 1:r1=3;
1:r0=3; 1:r1=1;
1:r0=3; 1:r1=3;
No
Witnesses
Positive: 0 Negative: 6
Condition exists (1:r1=2 /\ 1:r0=1)
Observation WW-RR Never 0 6
Time WW-RR 0.02
Hash=d91ecee2379f8ac878b8d06f17967874
**2. test-ww-r.litmus** //simulate our changes in __cp_buffer_busy()
C WW-R
(*
* Result: Never
*
* This test shows that write-write ordering
* (in P0()) is visible to external process P2().
*
* bit 0: lock
* bit 1: dirty
*)
{
x=2;
}
P0(int *x)
{
int r1;
r1 = READ_ONCE(*x);
r1 = r1 | 1; //lock
WRITE_ONCE(*x, r1);
r1 = READ_ONCE(*x);
r1 = r1 & 253; //&~2, clear dirty
WRITE_ONCE(*x, r1);
}
P1(int *x)
{
int r0;
r0 = READ_ONCE(*x);
}
exists (1:r0=0)
The test results are:
$ herd7 -conf linux-kernel.cfg litmus-tests/test-ww-r.litmus
Test WW-R Allowed
States 3
1:r0=1;
1:r0=2;
1:r0=3;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (1:r0=0)
Observation WW-R Never 0 3
Time WW-R 0.01
Hash=d76bb39c367dc0cbd0c363a87c6c9eb7
So it looks like the out-of-order situation cannot happen, am I miss something?
Thanks,
Yi.
> Do you mean there might be a sequence like following:
>
> jbd2_log_do_checkpoint
> if (buffer_dirty(bh))
> else if (buffer_locked(bh))
> else
> __jbd2_journal_remove_checkpoint(jh)
>
> CPU re-arranges the order of getting buffer state.
> reg_1 = buffer_locked(bh) // false
> lock_buffer(bh)
> clear_buffer(bh)
> reg_2 = buffer_dirty(bh) // false
>
> Then, jbd2_log_do_checkpoint() could become:
> if (reg_2)
> else if (reg_1)
> else
> __jbd2_journal_remove_checkpoint(jh) // enter !
>
> Am I understanding right?
>
>> What we should be doing to avoid these races is to lock the bh. So
>> something like:
>>
>> if (jh->b_transaction != NULL) {
>> do stuff
>> }
>> if (!trylock_buffer(bh)) {
>> buffer_locked() branch
>> }
>> ... Now we have the buffer locked and can safely check for dirtyness
>>
>> And we need to do a similar treatment for journal_clean_one_cp_list() and
>> journal_shrink_one_cp_list().
>>
>> BTW, I think we could merge journal_clean_one_cp_list() and
>> journal_shrink_one_cp_list() into a single common function. I think we can
>> drop the nr_to_scan argument and just always cleanup the whole checkpoint
>> list and return the number of freed buffers. That way we have one less
>> function to deal with checkpoint list cleaning.
>>
>> Thinking about it some more maybe we can have a function like:
>>
>> int jbd2_try_remove_checkpoint(struct journal_head *jh)
>> {
>> struct buffer_head *bh = jh2bh(jh);
>>
>> if (!trylock_buffer(bh) || buffer_dirty(bh))
>> return -EBUSY;
>> /*
>> * Buffer is clean and the IO has finished (we hold the buffer lock) so
>> * the checkpoint is done. We can safely remove the buffer from this
>> * transaction.
>> */
>> unlock_buffer(bh);
>> return __jbd2_journal_remove_checkpoint(jh);
>> }
>>
>> and that can be used with a bit of care in the checkpointing functions as
>> well as in jbd2_journal_forget(), __journal_try_to_free_buffer(),
>> journal_unmap_buffer().
>>
>> Honza
>>
Powered by blists - more mailing lists