lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 11 Sep 2020 18:50:22 +0530
From:   Ritesh Harjani <>
To:     Ye Bin <>
Cc:     Jan Kara <>,,,,
Subject: Re: [PATCH v3] ext4: Fix dead loop in ext4_mb_new_blocks

Hello Ye,

Please excuse if there is something horribly wrong with my email
formatting. Have yesterday received this laptop and still setting up
few things.

On 9/10/20 9:47 PM, Jan Kara wrote:
> On Thu 10-09-20 17:12:52, Ye Bin wrote:
>> As we test disk offline/online with running fsstress, we find fsstress
>> process is keeping running state.
>> kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>> ....
>> kworker/u32:3-262   [004] ...1   140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114
>> ext4_mb_new_blocks
>> repeat:
>> 	ext4_mb_discard_preallocations_should_retry(sb, ac, &seq)
>> 		freed = ext4_mb_discard_preallocations
>> 			ext4_mb_discard_group_preallocations
>> 				this_cpu_inc(discard_pa_seq);
>> 		---> freed == 0
>> 		seq_retry = ext4_get_discard_pa_seq_sum
>> 			for_each_possible_cpu(__cpu)
>> 				__seq += per_cpu(discard_pa_seq, __cpu);
>> 		if (seq_retry != *seq) {
>> 			*seq = seq_retry;
>> 			ret = true;
>> 		}
>> As we see seq_retry is sum of discard_pa_seq every cpu, if
>> ext4_mb_discard_group_preallocations return zero discard_pa_seq in this
>> cpu maybe increase one, so condition "seq_retry != *seq" have always
>> been met.
>> To Fix this problem, in ext4_mb_discard_group_preallocations function increase
>> discard_pa_seq only when it found preallocation to discard.
>> Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling")
>> Signed-off-by: Ye Bin <>
> Thanks for the patch. One comment below.
>> ---
>>   fs/ext4/mballoc.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index f386fe62727d..fd55264dc3fe 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -4191,7 +4191,6 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>>   	INIT_LIST_HEAD(&list);
>>   repeat:
>>   	ext4_lock_group(sb, group);
>> -	this_cpu_inc(discard_pa_seq);
>>   	list_for_each_entry_safe(pa, tmp,
>>   				&grp->bb_prealloc_list, pa_group_list) {
>>   		spin_lock(&pa->pa_lock);
>> @@ -4233,6 +4232,9 @@ ext4_mb_discard_group_preallocations(struct super_block *sb,
>>   		goto out;
>>   	}
>> +	/* only increase when find reallocation to discard */
>> +	this_cpu_inc(discard_pa_seq);
>> +
> This is a good place to increment the counter but I think you also need to
> handle the case:
>          if (free < needed && busy) {
>                  busy = 0;
>                  ext4_unlock_group(sb, group);
>                  cond_resched();
>                  goto repeat;
>          }
> We can unlock the group here after removing some preallocations and thus
> other processes checking discard_pa_seq could miss we did this. In fact I
> think the code is somewhat buggy here and we should also discard extents
> accumulated on "list" so far before unlocking the group. Ritesh?

mmm, so even though this code is not discarding those buffers b4
unlocking, but it has still removed those from the grp->bb_prealloc_list
and added it to the local list. And since it will at some point get
scheduled and start operating from repeat: label so functionally wise
this should be ok. Am I missing anything?

Although I agree, that if we remove at least the current pa's before
unlocking the group may be a good idea, but we should also check
why was this done like this at the first place.

I agree with Jan, that we should increment discard_pa_seq once we 
actually have something
to discard. I should have written a comment here to explain why we did 
this here.
But I think commit msg should have all the history (since I have a habit 
of writing long commit msgs ;)

But IIRC, it was done since in case if there is a parallel thread which 
is discarding
all the preallocations so the current thread may return 0 since it 
checks the
list_empty(&grp->bb_prealloc_list) check in 
ext4_mb_discard_group_preallocations() and returns 0 directly.

And why the discard_pa_seq counter at other places may not help since we 
remove the pa nodes from
grp->bb_prealloc_list into a local list and then start operating on
that. So meanwhile some thread may comes and just checks that the list
is empty and return 0 while some other thread may start discarding from
it's local list.
So I guess the main problem was that in the current code we remove
the pa from grp->bb_prealloc_list and add it to local list. So if 
someone else comes
and checks that grp->bb_prealloc_list is empty then it will directly 
return 0.

So, maybe we could do something like this then?

	ext4_lock_group(sb, group);
-	this_cpu_inc(discard_pa_seq);
list_for_each_entry_safe(pa, tmp,
				&grp->bb_prealloc_list, pa_group_list) {<...>

+		if (!free)
+			this_cpu_inc(discard_pa_seq);   // we should do this here before 
calling list_del(&pa->pa_group_list);

		/* we can trust pa_free ... */
		free += pa->pa_free;

		list_add(&pa->u.pa_tmp_list, &list);

I have some test cases around this to test for cases which were
failing. Since in world of parallelism you can't be 100% certain of some
corner case (like this one you just reported).
But, I don't have my other box rite now where I kept all of those -
due to some technical issues. I think I should be able to get those by
next week, if not, I anyways will setup my current machine for testing


Powered by blists - more mailing lists