lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9114E776-B44E-4CA5-BD49-C432A688C24E@whamcloud.com>
Date:   Thu, 21 Nov 2019 08:52:10 +0000
From:   Alex Zhuravlev <azhuravlev@...mcloud.com>
To:     Artem Blagodarenko <artem.blagodarenko@...il.com>
CC:     "Theodore Y. Ts'o" <tytso@....edu>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: [RFC] improve malloc for large filesystems



> On 21 Nov 2019, at 11:30, Artem Blagodarenko <artem.blagodarenko@...il.com> wrote:
> 
> Hello Alex,
> 
> Code looks good, but I have objections about the approach.
> 
> 512TB disk with 4k block size have 4194304 groups. So 4k groups is only ~0.01% of whole disk.
> Can we make decision to break search and get minimum blocks based on such limited data.
> I am not sure that spending some time to find good group is worse then allocate blocks without 
> optimisation. Especially, if disk is quite free and there are a lot of free block groups.

Exact number isn’t hardcoded and subject to discussion, but you don’t really want to scan 4M 
groups (especially uninitialised) to find “best” chunk.

This can be optimized further like “don’t count initialized and/or empty groups”, but still some limit
Is required, IMO. Notice this limit doesn’t apply if once we tried to find “best”, i.e. it’s applied only
with cr=0 and cr=1.


Thanks, Alex

> 
> Best regards,
> Artem Blagodarenko.
>> On 21 Nov 2019, at 10:03, Alex Zhuravlev <azhuravlev@...mcloud.com> wrote:
>> 
>> 
>> 
>>> On 20 Nov 2019, at 21:13, Theodore Y. Ts'o <tytso@....edu> wrote:
>>> 
>>> Hi Alex,
>>> 
>>> A couple of comments.  First, please separate this patch so that these
>>> two separate pieces of functionality can be reviewed and tested
>>> separately:
>>> 
>> 
>> This is the first patch of the series.
>> 
>> Thanks, Alex
>> 
>> From 81c4b3b5a17d94525bbc6d2d89b20f6618b05bc6 Mon Sep 17 00:00:00 2001
>> From: Alex Zhuravlev <bzzz@...mcloud.com>
>> Date: Thu, 21 Nov 2019 09:53:13 +0300
>> Subject: [PATCH 1/2] ext4: limit scanning for a good group
>> 
>> at first two rounds to prevent situation when 10x-100x thousand
>> of groups are scanned, especially non-initialized groups.
>> 
>> Signed-off-by: Alex Zhuravlev <bzzz@...mcloud.com>
>> ---
>> fs/ext4/ext4.h    |  2 ++
>> fs/ext4/mballoc.c | 14 ++++++++++++--
>> fs/ext4/sysfs.c   |  4 ++++
>> 3 files changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 03db3e71676c..d4e47fdad87c 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1480,6 +1480,8 @@ struct ext4_sb_info {
>> 	/* where last allocation was done - for stream allocation */
>> 	unsigned long s_mb_last_group;
>> 	unsigned long s_mb_last_start;
>> +	unsigned int s_mb_toscan0;
>> +	unsigned int s_mb_toscan1;
>> 
>> 	/* stats for buddy allocator */
>> 	atomic_t s_bal_reqs;	/* number of reqs with len > 1 */
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a3e2767bdf2f..cebd7d8df0b8 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2098,7 +2098,7 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
>> static noinline_for_stack int
>> ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>> {
>> -	ext4_group_t ngroups, group, i;
>> +	ext4_group_t ngroups, toscan, group, i;
>> 	int cr;
>> 	int err = 0, first_err = 0;
>> 	struct ext4_sb_info *sbi;
>> @@ -2169,7 +2169,15 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>> 		 */
>> 		group = ac->ac_g_ex.fe_group;
>> 
>> -		for (i = 0; i < ngroups; group++, i++) {
>> +		/* limit number of groups to scan at the first two rounds
>> +		 * when we hope to find something really good */
>> +		toscan = ngroups;
>> +		if (cr == 0)
>> +			toscan = sbi->s_mb_toscan0;
>> +		else if (cr == 1)
>> +			toscan = sbi->s_mb_toscan1;
>> +
>> +		for (i = 0; i < toscan; group++, i++) {
>> 			int ret = 0;
>> 			cond_resched();
>> 			/*
>> @@ -2872,6 +2880,8 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
>> 			bio_put(discard_bio);
>> 		}
>> 	}
>> +	sbi->s_mb_toscan0 = 1024;
>> +	sbi->s_mb_toscan1 = 4096;
>> 
>> 	list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
>> 		ext4_free_data_in_buddy(sb, entry);
>> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
>> index eb1efad0e20a..c96ee20f5487 100644
>> --- a/fs/ext4/sysfs.c
>> +++ b/fs/ext4/sysfs.c
>> @@ -198,6 +198,8 @@ EXT4_RO_ATTR_ES_UI(errors_count, s_error_count);
>> EXT4_ATTR(first_error_time, 0444, first_error_time);
>> EXT4_ATTR(last_error_time, 0444, last_error_time);
>> EXT4_ATTR(journal_task, 0444, journal_task);
>> +EXT4_RW_ATTR_SBI_UI(mb_toscan0, s_mb_toscan0);
>> +EXT4_RW_ATTR_SBI_UI(mb_toscan1, s_mb_toscan1);
>> 
>> static unsigned int old_bump_val = 128;
>> EXT4_ATTR_PTR(max_writeback_mb_bump, 0444, pointer_ui, &old_bump_val);
>> @@ -228,6 +230,8 @@ static struct attribute *ext4_attrs[] = {
>> 	ATTR_LIST(first_error_time),
>> 	ATTR_LIST(last_error_time),
>> 	ATTR_LIST(journal_task),
>> +	ATTR_LIST(mb_toscan0),
>> +	ATTR_LIST(mb_toscan1),
>> 	NULL,
>> };
>> ATTRIBUTE_GROUPS(ext4);
>> -- 
>> 2.20.1
>> 
>> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ