lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <C0BA654A-C359-4EFD-9C88-79A5D0562FEC@whamcloud.com>
Date:   Fri, 15 May 2020 08:13:42 +0000
From:   Alex Zhuravlev <azhuravlev@...mcloud.com>
To:     Ritesh Harjani <riteshh@...ux.ibm.com>
CC:     Alex Zhuravlev <azhuravlev@...mcloud.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 1/2] ext4: mballoc to prefetch groups ahead of scanning 

Hi,

Thanks for the review. Answers inline..


> On 14 May 2020, at 12:42, Ritesh Harjani <riteshh@...ux.ibm.com> wrote:
> Not sure if I completely understand above. Care to explain a bit more
> pls?

I’m all for that, but need to know what’s missing in the description.

>> Together with the patch "ext4: limit scanning of uninitialized groups"
> Where is this patch which says ^^^^^ ?

Hmm, my bad, that’s the patch you reviewed yet - ext4: skip non-loaded groups at cr=0/1

>> the mount time of a 1PB filesystem is reduced significantly:
>>                0% full    50%-full unpatched    patched
>>   mount time       33s                9279s       563s
> 
> Looks good. Do you have perf improvement numbers with this patch alone?
> Wonder if this could show improvement if I make my FS image using dm-sparse?

Actually I did, but few patches were in use at once, so this wouldn’t be correct to include numbers.
Will try to repeat just for this patch.

>> -ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group)
>> +ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group,
>> +				 int ignore_locked)
> 
> Should make ignore_locked as boolean.

OK

>> +	struct buffer_head *bh;
>> +	int nr;
> nr should be of type ext4_group_t.

OK


>> +	/* limit prefetching at cr=0, otherwise mballoc can
>> +	 * spend a lot of time loading imperfect groups */
> 
> Sorry not sure if I understand this completely. Please explain this
> a bit further.

Mballoc does few (4 at most) passes. Each pass has own sense of good group to try to allocate in.
You can see exact criteria in ext4_mb_good_group().
First two passes are supposed to scan the groups as fast as possible, looking for very good chunks.
This check limits IO issued at these passes, otherwise sooner or later (depending on filesystem size,
fragmentation,  IO throughput, etc) mballoc will get stuck.

>> +	if (ac->ac_criteria < 2 && ac->ac_prefetch_ios >= sbi->s_mb_prefetch_limit)
>> +		return;
> 
> So, what is ac_criteria is 3? We go and prefetch in that case?
> Or maybe do you mean that before even ac_critera reaches 3, the other
> condition may become false?

See in ext4_mb_good_group() - this is the last pass, mballoc just scans bitmaps, all previous passes
found no “good” group. Basically mballoc gave up with fast allocation and we’re going to do IO anyway,
just to be able to allocate. So prefetching is unconditional at this pass.

>> +	/* batch prefetching to get few READs in flight */
>> +	nr = ac->ac_prefetch - group;
>> +	if (ac->ac_prefetch < group)
>> +		/* wrapped to the first groups */
>> +		nr += ngroups;
>> +	if (nr > 0)
>> +		return;
>> +	BUG_ON(nr < 0);
> 
> Ok, so IIUC, we are doing this prefetching only once at the start of
> each allocation criteria. Because only that time nr will be 0.
> i.e. ac->ac_prefetch == group.
> Is that the case?
Hmm, shouldn’t be that way. As the allocation process goes group by group
It will meet this “ac_prefetch == group” condition again and this is where new
prefetch window starts. ext4_mb_prefetch() checks and submits IO, then
sets ac_prefetch to the next prefetch window’s start. 

>> +	if (ext4_has_feature_flex_bg(sb)) {
>> +		/* align to flex_bg to get more bitmas with a single IO */
> 
> /s/bitmas/bitmaps/

OK

>> +		if (++group >= ngroups)
>> +			group = 0;
> 
> So say nr = 32. and the group that you are started with has
> value = (ngroups - 32). Then when we are looping for the final time,
> nr will be 0, and the above check will make group = 0

Yes, this is group number, which wraps when we reach the last one?

> Though this case could be a corner case, but maybe something to be
> improved upon? Thoughts?

OK

>> +static void
>> +ext4_mb_prefetch_fini(struct ext4_allocation_context *ac)
> 
> Did you mean ext4_mb_prefetch_finish()?
> Not sure if I understand the function name completely.

Hmm, that’s what I used to: ini - init, fini - finish. This is pretty common naming schema?

>> +		if (group-- == 0)
>> +			group = ext4_get_groups_count(ac->ac_sb) - 1;
> 
> why play with an unsigned int like this in above.
> below looks much simpler no?
> 
> if (group == 0)
> 	group = ext4_get_groups_count(ac->ac_sb);
> group—;

Well, this is extra 5 bytes replicated and transferred zillion times over the globe.
Sorry, just kidding, I’m fine to change this if you find it’s better.

Thanks, Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ