lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Wed,  2 Aug 2017 12:41:38 +0800
From:   Wang Shilong <wangshilong1991@...il.com>
To:     linux-ext4@...r.kernel.org
Cc:     tytso@....edu, wshilong@....com, adilger@...ger.ca, sihara@....com,
        lixi@....com
Subject: [PATCH] ext4: reduce lock contention in __ext4_new_inode

From: Wang Shilong <wshilong@....com>

While running number of creating file threads concurrently,
we found heavy lock contention on group spinlock:

FUNC                           TOTAL_TIME(us)       COUNT        AVG(us)
ext4_create                    1707443399           1440000      1185.72
_raw_spin_lock                 1317641501           180899929    7.28
jbd2__journal_start            287821030            1453950      197.96
jbd2_journal_get_write_access  33441470             73077185     0.46
ext4_add_nondir                29435963             1440000      20.44
ext4_add_entry                 26015166             1440049      18.07
ext4_dx_add_entry              25729337             1432814      17.96
ext4_mark_inode_dirty          12302433             5774407      2.13

most of cpu time blames to _raw_spin_lock, here is some testing
numbers with/without patch.

Test environment:
Server : SuperMicro Sever (2 x E5-2690 v3@...0GHz, 128GB 2133MHz
	 DDR4 Memory, 8GbFC)
Storage : 2 x RAID1 (DDN SFA7700X, 4 x Toshiba PX02SMU020 200GB
	  Read Intensive SSD)

format command:
	mkfs.ext4 -J size=4096

test command:
	mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
		-r -i 5 -v -p 10 -u

Kernel version: 4.13.0-rc3

Test  1,440,000 files with 48 directories by 48 processes:

Without patch:

File Creation	File removal
79,033 		289,569 ops/per second
81,463 		285,359
79,875 		288,475
79,917 		284,624
79,420 		290,91

With patch:
File Creation	File removal
302,600 	312,813 ops/per second
295,644 	316,557
288,125 	306,961
302,960 	310,517
295,175 	311,927

Now create and removal performaces are similar, and creation
performaces are improved more than 3x with large journal size.
In default journal size, performances are improved by 50%.

Tested-by: Shuichi Ihara <sihara@....com>
Signed-off-by: Wang Shilong <wshilong@....com>
---
 fs/ext4/ialloc.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 507bfb3..18aeaf4 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -761,6 +761,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 	ext4_group_t flex_group;
 	struct ext4_group_info *grp;
 	int encrypt = 0;
+	unsigned attemp;
 
 	/* Cannot create files in a deleted directory */
 	if (!dir || !dir->i_nlink)
@@ -917,6 +918,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 			continue;
 		}
 
+		attemp = 0;
 repeat_in_this_group:
 		ino = ext4_find_next_zero_bit((unsigned long *)
 					      inode_bitmap_bh->b_data,
@@ -933,6 +935,9 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 			ino++;
 			goto next_inode;
 		}
+
+		attemp++;
+
 		if (!handle) {
 			BUG_ON(nblocks <= 0);
 			handle = __ext4_journal_start_sb(dir->i_sb, line_no,
@@ -957,8 +962,14 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
 		if (!ret2)
 			goto got; /* we grabbed the inode! */
 next_inode:
-		if (ino < EXT4_INODES_PER_GROUP(sb))
+		if (ino < EXT4_INODES_PER_GROUP(sb)) {
+			/* Lock contention, relax a bit */
+			if (attemp >= 2) {
+				schedule_timeout_uninterruptible(msecs_to_jiffies(1));
+				attemp = 0;
+			}
 			goto repeat_in_this_group;
+		}
 next_group:
 		if (++group == ngroups)
 			group = 0;
-- 
2.9.3

Powered by blists - more mailing lists