lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241018192525.95862-1-ryncsn@gmail.com>
Date: Sat, 19 Oct 2024 03:25:25 +0800
From: Kairui Song <ryncsn@...il.com>
To: linux-mm@...ck.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Yosry Ahmed <yosryahmed@...gle.com>,
	Nhat Pham <nphamcs@...il.com>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Chris Li <chrisl@...nel.org>,
	Barry Song <v-songbaohua@...o.com>,
	"Huang, Ying" <ying.huang@...el.com>,
	linux-kernel@...r.kernel.org,
	Kairui Song <kasong@...cent.com>
Subject: [PATCH] mm, zswap: don't touch the XArray lock if there is no entry to free

From: Kairui Song <kasong@...cent.com>

zswap_invalidation now already avoids touching the XArray if the whole
tree is empty, which is mostly beneficial only when zswap is disabled.
This commit takes it further by optimizing the case where zswap is
enabled.

To reduce lock overhead, we load the XArray value locklessly first
and keep the walk state. Only perform a locked erase if a entry is
found, thereby minimizing unnecessary XArray lock acquisitions.

Below tests are done with a 4G brd SWAP device with BLK_FEAT_SYNCHRONOUS
flag dropped to simulate fast SSD device, with zswap enabled and on a 32
core system:

Swapin of 4G mixed zero and 0x1 filled pages (avg of 12 test run):
Before:         After (-1.6%):
2315237 us      2277721 us

Swapin of 2G 0x1 filled pages (avg of 24 test run):
Before:         After (-0.5%):
4623561 us      4600406 us

Build linux kernel test with 2G memory cgroup limit (avg of 12 test
run, make -j32):
Before:         After (-0.2%):
1334.35s        1331.63s

Swapin of 2G 0x1 filled pages, but zswap disabled (avg of 24 test run):
Before:         After (+0.0%):
2513837 us      2514437 us

zswap enabled tests are a little bit faster, zswap disabled case are
identical.

Suggested-by: Yosry Ahmed <yosryahmed@...gle.com>
Signed-off-by: Kairui Song <kasong@...cent.com>

---

A previous patch [1] has been Acked and now in mm-unstable, that is a
valid optimization on its own. This patch is Suggested-by Yosry during
discussion. This patch is for a bit different cases (zswap disabled vs
zswap enabled), so instead of a V2, I sent this as a incremental
optimization and also tested it a little bit differently.

Link:
https://lore.kernel.org/linux-mm/20241011171950.62684-1-ryncsn@gmail.com/ [1]

 mm/zswap.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index f6316b66fb23..a5ba80ac8861 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1641,12 +1641,21 @@ void zswap_invalidate(swp_entry_t swp)
 	struct xarray *tree = swap_zswap_tree(swp);
 	struct zswap_entry *entry;
 
+	XA_STATE(xas, tree, offset);
+
 	if (xa_empty(tree))
 		return;
 
-	entry = xa_erase(tree, offset);
-	if (entry)
+	rcu_read_lock();
+	entry = xas_load(&xas);
+	if (entry) {
+		xas_lock(&xas);
+		WARN_ON_ONCE(xas_reload(&xas) != entry);
+		xas_store(&xas, NULL);
+		xas_unlock(&xas);
 		zswap_entry_free(entry);
+	}
+	rcu_read_unlock();
 }
 
 int zswap_swapon(int type, unsigned long nr_pages)
-- 
2.47.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ