lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 27 Apr 2020 04:03:24 +0000
From:   Alex Zhuravlev <azhuravlev@...mcloud.com>
To:     "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: [PATCH 2/2] ext4: skip non-loaded groups at cr=0/1

Hi, yet another patch.

cr=0 is supposed to be an optimization to save CPU cycles, but if buddy data (in memory)
is not initialized then all this makes no sense as we have to do sync IO taking a lot of cycles.
also, at cr=0 mballoc doesn't store any avaibale chunk. cr=1 also skips groups using heuristic
based on avg. fragment size. it's more useful to skip such groups and switch to cr=2 where
groups will be scanned for available chunks.

using sparse image and dm-slow virtual device of 120TB was simulated. then the image was
formatted and filled using debugfs to mark ~85% of available space as busy. mount process w/o
the patch couldn't complete in half an hour (according to vmstat it would take ~10-11 hours).
with the patch applied mount took ~20 seconds.

Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988
Signed-off-by: Alex Zhuravlev <bzzz@...mcloud.com>
Reviewed-by: Andreas Dilger <adilger@...mcloud.com>
---
 fs/ext4/mballoc.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e84c298e739b..83e3e6ab1240 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1877,6 +1877,21 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
 	return 0;
 }
 
+static inline int ext4_mb_uninit_on_disk(struct super_block *sb,
+				    ext4_group_t group)
+{
+	struct ext4_group_desc *desc;
+
+	if (!ext4_has_group_desc_csum(sb))
+		return 0;
+
+	desc = ext4_get_group_desc(sb, group, NULL);
+	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
+		return 1;
+
+	return 0;
+}
+
 /*
  * The routine scans buddy structures (not bitmap!) from given order
  * to max order and tries to find big enough chunk to satisfy the req
@@ -2060,7 +2075,15 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
 
 	/* We only do this if the grp has never been initialized */
 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
-		int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
+		int ret;
+
+		/* cr=0/1 is a very optimistic search to find large
+		 * good chunks almost for free. if buddy data is
+		 * not ready, then this optimization makes no sense */
+
+		if (cr < 2 && !ext4_mb_uninit_on_disk(ac->ac_sb, group))
+			return 0;
+		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
 		if (ret)
 			return ret;
 	}
-- 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ