[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0e48b9bb-f839-4b3b-dbce-45755618df97@linux.ibm.com>
Date: Fri, 11 Sep 2020 18:50:22 +0530
From: Ritesh Harjani <riteshh@...ux.ibm.com>
To: Ye Bin <yebin10@...wei.com>
Cc: Jan Kara <jack@...e.cz>, tytso@....edu, adilger.kernel@...ger.ca,
jack@...e.com, linux-ext4@...r.kernel.org
Subject: Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks
Hello Ye,
Please excuse if there is something horribly wrong with my email
formatting. Have yesterday received this laptop and still setting up
few things.
On 9/10/20 9:47 PM, Jan Kara wrote:
> On Thu 10-09-20 17:12:52, Ye Bin wrote:
>> As we test disk offline/online with running fsstress, we find fsstress
>> process is keeping running state.
>> kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>> ....
>> kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>>
>> ext4_mb_new_blocks
>> repeat:
>> ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
>> freed = ext4_mb_discard_preallocations
>> ext4_mb_discard_group_preallocations
>> this_cpu_inc(discard_pa_seq);
>> ---> freed == 0
>> seq_retry = ext4_get_discard_pa_seq_sum
>> for_each_possible_cpu(__cpu)
>> __seq += per_cpu(discard_pa_seq, __cpu);
>> if (seq_retry != *seq) {
>> *seq = seq_retry;
>> ret = true;
>> }
>>
>> As we see seq_retry is sum of discard_pa_seq every cpu, if
>> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
>> cpu maybe increase one, so condition "seq_retry != *seq" have always
>> been met.
>> To Fix this problem, in ext4_mb_discard_group_preallocations function increase
>> discard_pa_seq only when it found preallocation to discard.
>>
>> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
>> Signed-off-by: Ye Bin <yebin10@...wei.com>
>
> Thanks for the patch. One comment below.
>
>> ---
>> fs/ext4/mballoc.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index f386fe62727d..fd55264dc3fe 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>> INIT_LIST_HEAD(&list);
>> repeat:
>> ext4_lock_group(sb, group);
>> - this_cpu_inc(discard_pa_seq);
>> list_for_each_entry_safe(pa, tmp,
>> &grp->bb_prealloc_list, pa_group_list) {
>> spin_lock(&pa->pa_lock);
>> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>> goto out;
>> }
>>
>> + /* only increase when find reallocation to discard */
>> + this_cpu_inc(discard_pa_seq);
>> +
>
> This is a good place to increment the counter but I think you also need to
> handle the case:
>
> if (free < needed && busy) {
> busy = 0;
> ext4_unlock_group(sb, group);
> cond_resched();
> goto repeat;
> }
>
> We can unlock the group here after removing some preallocations and thus
> other processes checking discard_pa_seq could miss we did this. In fact I
> think the code is somewhat buggy here and we should also discard extents
> accumulated on "list" so far before unlocking the group. Ritesh?
>
mmm, so even though this code is not discarding those buffers b4
unlocking, but it has still removed those from the grp->bb_prealloc_list
and added it to the local list. And since it will at some point get
scheduled and start operating from repeat: label so functionally wise
this should be ok. Am I missing anything?
Although I agree, that if we remove at least the current pa's before
unlocking the group may be a good idea, but we should also check
why was this done like this at the first place.
I agree with Jan, that we should increment discard_pa_seq once we
actually have something
to discard. I should have written a comment here to explain why we did
this here.
But I think commit msg should have all the history (since I have a habit
of writing long commit msgs ;)
But IIRC, it was done since in case if there is a parallel thread which
is discarding
all the preallocations so the current thread may return 0 since it
checks the
list_empty(&grp->bb_prealloc_list) check in
ext4_mb_discard_group_preallocations() and returns 0 directly.
And why the discard_pa_seq counter at other places may not help since we
remove the pa nodes from
grp->bb_prealloc_list into a local list and then start operating on
that. So meanwhile some thread may comes and just checks that the list
is empty and return 0 while some other thread may start discarding from
it's local list.
So I guess the main problem was that in the current code we remove
the pa from grp->bb_prealloc_list and add it to local list. So if
someone else comes
and checks that grp->bb_prealloc_list is empty then it will directly
return 0.
So, maybe we could do something like this then?
repeat:
ext4_lock_group(sb, group);
- this_cpu_inc(discard_pa_seq);
list_for_each_entry_safe(pa, tmp,
&grp->bb_prealloc_list, pa_group_list) {<...>
+ if (!free)
+ this_cpu_inc(discard_pa_seq); // we should do this here before
calling list_del(&pa->pa_group_list);
/* we can trust pa_free ... */
free += pa->pa_free;
spin_unlock(&pa->pa_lock);
list_del(&pa->pa_group_list);
list_add(&pa->u.pa_tmp_list, &list);
}
I have some test cases around this to test for cases which were
failing. Since in world of parallelism you can't be 100% certain of some
corner case (like this one you just reported).
But, I don't have my other box rite now where I kept all of those -
due to some technical issues. I think I should be able to get those by
next week, if not, I anyways will setup my current machine for testing
this.
-ritesh
Powered by blists - more mailing lists