[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <5BBA0C9A-E028-48E0-85F8-79E57A1A912B@gmail.com>
Date: Thu, 30 May 2019 21:05:56 +0300
From: Artem Blagodarenko <artem.blagodarenko@...il.com>
To: Andreas Dilger <adilger@...ger.ca>
Cc: linux-ext4 <linux-ext4@...r.kernel.org>, adilger.kernel@...ger.ca,
Alexey Lyashkov <alexey.lyashkov@...il.com>
Subject: Re: [RFC PATCH] don't search large block range if disk is full
Hello Andreas,
Thank you for feedback!
I really wanted send new version (with test results, but without kernel decision-maker) of this patch this evening, but you were faster.
> On 30 May 2019, at 19:56, Andreas Dilger <adilger@...ger.ca> wrote:
>
> Artem, we discussed this patch on the Ext4 concall today. A couple
> of items came up during discussion:
> - the patch submission should include performance results to
> show that the patch is providing an improvement
> - it would be preferable if the thresholds for the stages were found
> dynamically in the kernel based on how many groups have been skipped
> and the free chunk size in each group
> - there would need to be some way to dynamically reset the scanning
> level when lots of blocks have been freed
>
> Cheers, Andreas
My suggestion is split this plan to 2 phases.
Phase 1 - loop skipping code and interface to user-mode that gives to administrator ability configure loop-skipping code.
Phase 2 in kernel discussion-maker based on groups info (and some other information)
Here are testing results I wanted to add to new patch version. Adding it here for descussion:
Here are some aproach test results.
During test, system was fragmented with pattern "50 free blocks - 50
occupied blocks". Performance digradated from 1.2 Gb/sed to 10 MB/sec.
68719476736 bytes (69 GB) copied, 6619.02 s, 10.4 MB/s
Let's exlude c1 loops
echo "60" > /sys/fs/ext4/md0/mb_c1_threshold
Excluding c1 loops doesn't change performance. Same 10 MB/s
Statistics shows that 981753 c1 loops were skipped, but
1664192 finished without sucess.
mballoc: (7829, 1664192, 0) useless c(0,1,2) loops
mballoc: (981753, 0, 0) skipped c(0,1,2) loops
Then c1 and c2 loops ware disabled.
echo "60" > /sys/fs/ext4/md0/mb_c1_threshold
echo "60" > /sys/fs/ext4/md0/mb_c2_threshold
mballoc: (0, 0, 0) useless c(0,1,2) loops
mballoc: (1425456, 1393743, 0) skipped c(0,1,2) loops
A lot of loops c1 and c2 skipped.
For given fragmentation write performance returned to ~500 MB/s
68719476736 bytes (69 GB) copied, 133.066 s, 516 MB/s
This is example how to improve performance for exact
partition fragmentation. The patch adds interfaces for
adjusting block allocator for any situation.
Best regards,
Artem Blagodarenko.
>> On Mar 11, 2019, at 03:08, Artem Blagodarenko <artem.blagodarenko@...il.com> wrote:
>>
>> Block allocator tries to find:
>> 1) group with the same range as required
>> 2) group with the same average range as required
>> 3) group with required amount of space
>> 4) any group
>>
>> For quite full disk step 1 is failed with higth
>> probability, but takes a lot of time.
>>
>> Skip 1st step if disk full > 75%
>> Skip 2d step if disk full > 85%
>> Skip 3d step if disk full > 95%
>>
>> This three tresholds can be adjusted through added interface.
>>
>> Signed-off-by: Artem Blagodarenko <c17828@...y.com>
>> ---
>> fs/ext4/ext4.h | 3 +++
>> fs/ext4/mballoc.c | 32 ++++++++++++++++++++++++++++++++
>> fs/ext4/mballoc.h | 3 +++
>> fs/ext4/sysfs.c | 6 ++++++
>> 4 files changed, 44 insertions(+)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 185a05d3257e..fbccb459a296 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1431,6 +1431,9 @@ struct ext4_sb_info {
>> unsigned int s_mb_min_to_scan;
>> unsigned int s_mb_stats;
>> unsigned int s_mb_order2_reqs;
>> + unsigned int s_mb_c1_treshold;
>> + unsigned int s_mb_c2_treshold;
>> + unsigned int s_mb_c3_treshold;
>> unsigned int s_mb_group_prealloc;
>> unsigned int s_max_dir_size_kb;
>> /* where last allocation was done - for stream allocation */
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 4e6c36ff1d55..85f364aa96c9 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2096,6 +2096,20 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
>> return 0;
>> }
>>
>> +static u64 available_blocks_count(struct ext4_sb_info *sbi)
>> +{
>> + ext4_fsblk_t resv_blocks;
>> + u64 bfree;
>> + struct ext4_super_block *es = sbi->s_es;
>> +
>> + resv_blocks = EXT4_C2B(sbi, atomic64_read(&sbi->s_resv_clusters));
>> + bfree = percpu_counter_sum_positive(&sbi->s_freeclusters_counter) -
>> + percpu_counter_sum_positive(&sbi->s_dirtyclusters_counter);
>> +
>> + bfree = EXT4_C2B(sbi, max_t(s64, bfree, 0));
>> + return bfree - (ext4_r_blocks_count(es) + resv_blocks);
>> +}
>> +
>> static noinline_for_stack int
>> ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>> {
>> @@ -2104,10 +2118,13 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>> int err = 0, first_err = 0;
>> struct ext4_sb_info *sbi;
>> struct super_block *sb;
>> + struct ext4_super_block *es;
>> struct ext4_buddy e4b;
>> + unsigned int free_rate;
>>
>> sb = ac->ac_sb;
>> sbi = EXT4_SB(sb);
>> + es = sbi->s_es;
>> ngroups = ext4_get_groups_count(sb);
>> /* non-extent files are limited to low blocks/groups */
>> if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)))
>> @@ -2157,6 +2174,18 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>>
>> /* Let's just scan groups to find more-less suitable blocks */
>> cr = ac->ac_2order ? 0 : 1;
>> +
>> + /* Choose what loop to pass based on disk fullness */
>> + free_rate = available_blocks_count(sbi) * 100 / ext4_blocks_count(es);
>> +
>> + if (free_rate < sbi->s_mb_c3_treshold) {
>> + cr = 3;
>> + } else if(free_rate < sbi->s_mb_c2_treshold) {
>> + cr = 2;
>> + } else if(free_rate < sbi->s_mb_c1_treshold) {
>> + cr = 1;
>> + }
>> +
>> /*
>> * cr == 0 try to get exact allocation,
>> * cr == 3 try to get anything
>> @@ -2618,6 +2647,9 @@ int ext4_mb_init(struct super_block *sb)
>> sbi->s_mb_stats = MB_DEFAULT_STATS;
>> sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD;
>> sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS;
>> + sbi->s_mb_c1_treshold = MB_DEFAULT_C1_TRESHOLD;
>> + sbi->s_mb_c2_treshold = MB_DEFAULT_C2_TRESHOLD;
>> + sbi->s_mb_c3_treshold = MB_DEFAULT_C3_TRESHOLD;
>> /*
>> * The default group preallocation is 512, which for 4k block
>> * sizes translates to 2 megabytes. However for bigalloc file
>> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
>> index 88c98f17e3d9..d880923e55a5 100644
>> --- a/fs/ext4/mballoc.h
>> +++ b/fs/ext4/mballoc.h
>> @@ -71,6 +71,9 @@ do { \
>> * for which requests use 2^N search using buddies
>> */
>> #define MB_DEFAULT_ORDER2_REQS 2
>> +#define MB_DEFAULT_C1_TRESHOLD 25
>> +#define MB_DEFAULT_C2_TRESHOLD 15
>> +#define MB_DEFAULT_C3_TRESHOLD 5
>>
>> /*
>> * default group prealloc size 512 blocks
>> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
>> index 9212a026a1f1..e4f1d98195c2 100644
>> --- a/fs/ext4/sysfs.c
>> +++ b/fs/ext4/sysfs.c
>> @@ -175,6 +175,9 @@ EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
>> EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
>> EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
>> EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
>> +EXT4_RW_ATTR_SBI_UI(mb_c1_treshold, s_mb_c1_treshold);
>> +EXT4_RW_ATTR_SBI_UI(mb_c2_treshold, s_mb_c2_treshold);
>> +EXT4_RW_ATTR_SBI_UI(mb_c3_treshold, s_mb_c3_treshold);
>> EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
>> EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
>> EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
>> @@ -203,6 +206,9 @@ static struct attribute *ext4_attrs[] = {
>> ATTR_LIST(mb_max_to_scan),
>> ATTR_LIST(mb_min_to_scan),
>> ATTR_LIST(mb_order2_req),
>> + ATTR_LIST(mb_c1_treshold),
>> + ATTR_LIST(mb_c2_treshold),
>> + ATTR_LIST(mb_c3_treshold),
>> ATTR_LIST(mb_stream_req),
>> ATTR_LIST(mb_group_prealloc),
>> ATTR_LIST(max_writeback_mb_bump),
>> --
>> 2.14.3
>>
Powered by blists - more mailing lists