[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <5C1A2791-92E9-46EB-9A6F-BD32D8A12368@dilger.ca>
Date: Tue, 15 May 2018 14:22:21 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Wang Shilong <wangshilong1991@...il.com>
Cc: Ext4 Developers List <linux-ext4@...r.kernel.org>,
Wang Shilong <wshilong@....com>
Subject: Re: [PATCH] ext4: add an interface to load block bitmaps
On May 15, 2018, at 7:06 AM, Wang Shilong <wangshilong1991@...il.com> wrote:
>
> From: Wang Shilong <wshilong@....com>
>
> During our benchmarking, we found sometimes writing
> performances are not stable enough and there are some
s/performances/performance/
> small read during write which could drop throughput(~30%).
>
> It turned out that block bitmaps loading could make
s/make/add/
> some latency here,also for a heavy fragmented filesystem,
> we might need load many bitmaps to find some free blocks.
>
> To improve above situation, we had a patch to load block
s/had/have/
> bitmaps to memory and pin those bitmaps memory until umount
... in memory ...
> or we release the memory on purpose, this could stable write
... on purpose. This can stabilize write
> performances and improve performances of a heavy fragmented
s/performances/performance/g s/heavy/heavily/
> filesystem.
>
> Tested-by: Shuichi Ihara <sihara@....com>
> Signed-off-by: Wang Shilong <wshilong@....com>
> ---
> fs/ext4/balloc.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> fs/ext4/ext4.h | 12 +++++++
> fs/ext4/super.c | 3 ++
> fs/ext4/sysfs.c | 26 ++++++++++++++
> 4 files changed, 146 insertions(+)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index b00481c..ceb63e8 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -505,6 +505,8 @@ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group,
> EXT4_GROUP_INFO_BBITMAP_CORRUPT);
> return -EIO;
> }
> + /* race is fine */
> + EXT4_SB(sb)->bbitmaps_read_cnt++;
> clear_buffer_new(bh);
> /* Panic or remount fs read-only if block bitmap is invalid */
> return ext4_validate_block_bitmap(sb, desc, block_group, bh);
> @@ -660,6 +662,109 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle,
> +int ext4_load_block_bitmaps_bh(struct super_block *sb, unsigned int op)
This should take a named enum as parameter instead of "unsigned int".
It would probably be better to integrate this function into the existing
group descriptor handling in ext4_mb_init_backend->ext4_mb_add_groupinfo()
since it is already iterating all of the group descriptors, and it also
handles the case of groups being added by online filesystem resizing.
> +{
> + struct buffer_head *bitmap_bh;
> + struct ext4_group_desc *gdp;
> + ext4_group_t i, j;
> + ext4_group_t ngroups = ext4_get_groups_count(sb);
> + ext4_group_t cnt = 0;
> +
> + if (op < EXT4_LOAD_BBITMAPS || op > EXT4_PIN_BBITMAPS)
> + return -EINVAL;
> +
> + mutex_lock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> + /* don't pin bitmaps several times */
> + if (EXT4_SB(sb)->s_load_bbitmaps == EXT4_PIN_BBITMAPS) {
> + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> + return 0;
> + }
> +
> + for (i = 0; i < ngroups; i++) {
> + gdp = ext4_get_group_desc(sb, i, NULL);
> + if (!gdp)
> + continue;
> + /* Load is simple, we could tolerate any
> + * errors and continue to handle, but for
> + * pin we return directly for simple handling
> + * in unpin codes, otherwiese we need remember
(typo) s/otherwiese/otherwise/ ... we need to remember
> + * which block bitmaps we pin exactly.
> + */
It should be straightforward to track the pinned bitmaps, by referencing
the buffer head in ext4_group_info with a new field like bb_block_bh.
That would allow the bitmaps to be pinned, but if there is some error
loading one of them it doesn't prevent the filesystem from being mounted.
I don't think that an error pinning the bitmaps should be a mount failure,
since this could happen if the system is short of memory for some reason.
It would be possible to check in ext4_read_block_bitmap() if the current
goal is to pin the bitmap that isn't loaded then a refcount is added to
bb_block_bh.
> + bitmap_bh = ext4_read_block_bitmap(sb, i);
> + if (IS_ERR(bitmap_bh)) {
> + if (op == EXT4_LOAD_BBITMAPS)
> + continue;
> + else
(style) no need for "else" after "continue"
> + goto failed;
> + }
> + if (op == EXT4_LOAD_BBITMAPS)
> + brelse(bitmap_bh);
> + cnt++;
> + }
> + /* Reset block bitmap to zero now */
> + EXT4_SB(sb)->bbitmaps_read_cnt = 0;
> + ext4_msg(sb, KERN_INFO, "%s %u block bitmaps finished",
> + op == EXT4_PIN_BBITMAPS ? "pin" : "load", cnt);
> + EXT4_SB(sb)->s_load_bbitmaps = EXT4_PIN_BBITMAPS;
> + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +
> + return 0;
> +failed:
> + for (j = 0; j < i; j++) {
> + gdp = ext4_get_group_desc(sb, i, NULL);
> + if (!gdp)
> + continue;
> + bitmap_bh = ext4_read_block_bitmap(sb, i);
> + if (!IS_ERR(bitmap_bh)) {
> + brelse(bitmap_bh);
> + brelse(bitmap_bh);
> + }
> + }
> + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> + return PTR_ERR(bitmap_bh);
> +}
> +
> +void ext4_unpin_block_bitmaps_bh(struct super_block *sb)
> +{
> + struct buffer_head *bitmap_bh;
> + struct ext4_group_desc *gdp;
> + ext4_group_t i;
> + ext4_group_t ngroups = ext4_get_groups_count(sb);
> + ext4_group_t cnt = 0;
> +
> + mutex_lock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> + if (EXT4_SB(sb)->s_load_bbitmaps == EXT4_UNPIN_BBITMAPS) {
> + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> + return;
> + }
> +
> + ext4_msg(sb, KERN_INFO,
> + "Read block block bitmaps: %lu afer %s",
(typo) s/afer/after/
> + EXT4_SB(sb)->bbitmaps_read_cnt,
> + EXT4_SB(sb)->s_load_bbitmaps == EXT4_PIN_BBITMAPS ?
> + "pin" : "load");
> +
> + if (EXT4_SB(sb)->s_load_bbitmaps != EXT4_PIN_BBITMAPS) {
> + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> + return;
> + }
> +
> + for (i = 0; i < ngroups; i++) {
> + gdp = ext4_get_group_desc(sb, i, NULL);
This should be changed to:
struct ext4_group_info *grinfo = ext4_get_group_info(sb, i);
and then only brelse the bufferhead if it was pinned properly:
if (!IS_ERR(grinfo->bb_block_bh))
brelse(grinfo->bb_block_bh);
grinfo->bb_block_bh = NULL;
In theory we could also optimize ext4_read_block_bitmap() to add a
reference to bb_block_bh directly instead of the extra work done there.
> + if (!gdp)
> + continue;
> + bitmap_bh = ext4_read_block_bitmap(sb, i);
> + if (IS_ERR(bitmap_bh))
> + continue;
> + brelse(bitmap_bh);
> + brelse(bitmap_bh);
> + cnt++;
> + }
> + ext4_msg(sb, KERN_INFO, "Unpin %u lock bitmaps finished", cnt);
> + EXT4_SB(sb)->s_load_bbitmaps = EXT4_UNPIN_BBITMAPS;
> + mutex_unlock(&EXT4_SB(sb)->s_load_bbitmaps_lock);
> +}
> +
> /**
> * ext4_count_free_clusters() -- count filesystem free clusters
> * @sb: superblock
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index fa52b7d..4f9ee73 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1317,6 +1317,12 @@ struct ext4_super_block {
> /* Number of quota types we support */
> #define EXT4_MAXQUOTAS 3
>
> +enum {
This should be a named enum, like ext4_bitmap_state.
> + EXT4_UNPIN_BBITMAPS = 0,
> + EXT4_LOAD_BBITMAPS,
> + EXT4_PIN_BBITMAPS,
Since these values are exported to userspace, they should have specific
values assigned. Also, we should consider the case of being able to
load/pin inode bitmaps. Something like:
enum ext4_bitmap_state {
EXT4_UNPIN_BITMAPS = 0x00,
EXT4_LOAD_BBITMAPS = 0x01,
EXT4_PIN_BBITMAPS = 0x02,
EXT4_LOAD_IBITMAPS = 0x04,
EXT4_PIN_IBITMAPS = 0x08,
};
> diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
> index 9ebd26c..89396b3f 100644
> --- a/fs/ext4/sysfs.c
> +++ b/fs/ext4/sysfs.c
>
> +static ssize_t load_bbitmaps_store(struct ext4_sb_info *sbi,
> + const char *buf, size_t count)
I'd suggest to take "b" (block) out of the name here so that this
can also be used for inode bitmaps in the future.
> +{
> + unsigned long long val;
> + int ret;
> +
> + ret = kstrtoull(skip_spaces(buf), 0, &val);
> + if (ret || val > EXT4_PIN_BBITMAPS)
> + return -EINVAL;
> +
> + if (val == EXT4_UNPIN_BBITMAPS)
> + ext4_unpin_block_bitmaps_bh(sbi->s_sb);
> + else if (val > EXT4_UNPIN_BBITMAPS)
> + ret = ext4_load_block_bitmaps_bh(sbi->s_sb, val);
This is not a great userspace interface, since the EXT4_*_BBITMAPS
values don't have specific values. It would be nicer to accept
string names like "block" or "block-pin" or "inode" or "inode-pin",
and "unpin", but at least if there are fixed numbers (something like
"echo 6 > /sys/fs/ext4/sda/load_bitmaps" to load both bitmaps and
also pin the block bitmaps) the userspace interface won't change.
> + return ret ? ret : count;
> +}
>
> @@ -270,6 +291,9 @@ static ssize_t ext4_attr_show(struct kobject *kobj,
> return snprintf(buf, PAGE_SIZE, "%llu\n",
> (unsigned long long)
> atomic64_read(&sbi->s_resv_clusters));
> + case attr_load_bbitmaps:
> + return snprintf(buf, PAGE_SIZE, "%u\n",
> + sbi->s_load_bbitmaps);
This should use "%#x" so that we know which flags are set.
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists