[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170821080220.GA19196@quack2.suse.cz>
Date: Mon, 21 Aug 2017 10:02:20 +0200
From: Jan Kara <jack@...e.cz>
To: Wang Shilong <wangshilong1991@...il.com>
Cc: linux-ext4@...r.kernel.org, sihara@....com, tytso@....edu,
adilger@...ger.ca, lixi@....com, wshilong@....com, jack@...e.cz
Subject: Re: [PATCH 3/3 v5] ext4: reduce lock contention in __ext4_new_inode
On Sun 20-08-17 19:54:37, Wang Shilong wrote:
> From: Wang Shilong <wshilong@....com>
>
> While running number of creating file threads concurrently,
> we found heavy lock contention on group spinlock:
>
> FUNC TOTAL_TIME(us) COUNT AVG(us)
> ext4_create 1707443399 1440000 1185.72
> _raw_spin_lock 1317641501 180899929 7.28
> jbd2__journal_start 287821030 1453950 197.96
> jbd2_journal_get_write_access 33441470 73077185 0.46
> ext4_add_nondir 29435963 1440000 20.44
> ext4_add_entry 26015166 1440049 18.07
> ext4_dx_add_entry 25729337 1432814 17.96
> ext4_mark_inode_dirty 12302433 5774407 2.13
>
> most of cpu time blames to _raw_spin_lock, here is some testing
> numbers with/without patch.
>
> Test environment:
> Server : SuperMicro Sever (2 x E5-2690 v3@...0GHz, 128GB 2133MHz
> DDR4 Memory, 8GbFC)
> Storage : 2 x RAID1 (DDN SFA7700X, 4 x Toshiba PX02SMU020 200GB
> Read Intensive SSD)
>
> format command:
> mkfs.ext4 -J size=4096
>
> test command:
> mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
> -r -i 1 -v -p 10 -u #first run to load inode
>
> mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
> -r -i 3 -v -p 10 -u
>
> Kernel version: 4.13.0-rc3
>
> Test 1,440,000 files with 48 directories by 48 processes:
>
> Without patch:
>
> File Creation File removal
> 79,033 289,569 ops/per second
> 81,463 285,359
> 79,875 288,475
>
> With patch:
> File Creation File removal
> 810669 301694
> 812805 302711
> 813965 297670
>
> Creation performance is improved more than 10X with large
> journal size. The main problem here is we test bitmap
> and do some check and journal operations which could be
> slept, then we test and set with lock hold, this could
> be racy, and make 'inode' steal by other process.
>
> However, after first try, we could confirm handle has
> been started and inode bitmap journaled too, then
> we could find and set bit with lock hold directly, this
> will mostly gurateee success with second try.
>
> Tested-by: Shuichi Ihara <sihara@....com>
> Signed-off-by: Wang Shilong <wshilong@....com>
I like the result! You can add:
Reviewed-by: Jan Kara <jack@...e.cz>
Honza
> ---
> v4->v5: better refactor codes as Jan Kara suggested.
> ---
> fs/ext4/ialloc.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 39 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 9e6eb1c..fb83a36 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -730,6 +730,27 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
> return ret;
> }
>
> +static int find_inode_bit(struct super_block *sb, ext4_group_t group,
> + struct buffer_head *bitmap, unsigned long *ino)
> +{
> +next:
> + *ino = ext4_find_next_zero_bit((unsigned long *)
> + bitmap->b_data,
> + EXT4_INODES_PER_GROUP(sb), *ino);
> + if (*ino >= EXT4_INODES_PER_GROUP(sb))
> + return 0;
> +
> + if ((EXT4_SB(sb)->s_journal == NULL) &&
> + recently_deleted(sb, group, *ino)) {
> + *ino = *ino + 1;
> + if (*ino < EXT4_INODES_PER_GROUP(sb))
> + goto next;
> + return 0;
> + }
> +
> + return 1;
> +}
> +
> /*
> * There are two policies for allocating an inode. If the new inode is
> * a directory, then a forward search is made for a block group with both
> @@ -910,21 +931,16 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
> }
>
> repeat_in_this_group:
> - ino = ext4_find_next_zero_bit((unsigned long *)
> - inode_bitmap_bh->b_data,
> - EXT4_INODES_PER_GROUP(sb), ino);
> - if (ino >= EXT4_INODES_PER_GROUP(sb))
> + ret2 = find_inode_bit(sb, group, inode_bitmap_bh, &ino);
> + if (!ret2)
> goto next_group;
> - if (group == 0 && (ino+1) < EXT4_FIRST_INO(sb)) {
> +
> + if (group == 0 && (ino + 1) < EXT4_FIRST_INO(sb)) {
> ext4_error(sb, "reserved inode found cleared - "
> "inode=%lu", ino + 1);
> goto next_group;
> }
> - if ((EXT4_SB(sb)->s_journal == NULL) &&
> - recently_deleted(sb, group, ino)) {
> - ino++;
> - goto next_inode;
> - }
> +
> if (!handle) {
> BUG_ON(nblocks <= 0);
> handle = __ext4_journal_start_sb(dir->i_sb, line_no,
> @@ -944,11 +960,23 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
> }
> ext4_lock_group(sb, group);
> ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
> + if (ret2) {
> + /* Someone already took the bit. Repeat the search
> + * with lock held.
> + */
> + ret2 = find_inode_bit(sb, group, inode_bitmap_bh, &ino);
> + if (ret2) {
> + ext4_set_bit(ino, inode_bitmap_bh->b_data);
> + ret2 = 0;
> + } else {
> + ret2 = 1; /* we didn't grab the inode */
> + }
> + }
> ext4_unlock_group(sb, group);
> ino++; /* the inode bitmap is zero-based */
> if (!ret2)
> goto got; /* we grabbed the inode! */
> -next_inode:
> +
> if (ino < EXT4_INODES_PER_GROUP(sb))
> goto repeat_in_this_group;
> next_group:
> --
> 2.9.3
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists