lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250806161748.76651-1-ryncsn@gmail.com>
Date: Thu,  7 Aug 2025 00:17:45 +0800
From: Kairui Song <ryncsn@...il.com>
To: linux-mm@...ck.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Kemeng Shi <shikemeng@...weicloud.com>,
	Chris Li <chrisl@...nel.org>,
	Nhat Pham <nphamcs@...il.com>,
	Baoquan He <bhe@...hat.com>,
	Barry Song <baohua@...nel.org>,
	"Huang, Ying" <ying.huang@...ux.alibaba.com>,
	linux-kernel@...r.kernel.org,
	Kairui Song <kasong@...cent.com>
Subject: [PATCH v2 0/3] mm, swap: improve cluster scan strategy

From: Kairui Song <kasong@...cent.com>

This series improves the large allocation performance and reduces
the failure rate. Some design of the cluster alloactor was later
found to be improvable after thorough testing.

The allocator spent too much effort scanning the fragment list, which
is not helpful in most setups, but causes serious contention of the
list lock (si->lock). Besides, the allocator prefers free clusters
when searching for a new cluster due to historical reasons, which
causes fragmentation issues.

So make the allocator only scan one cluster for high order allocation,
and prefer nonfull cluster. This both improves the performance and
reduces fragmentation.

For example, build kernel test with make -j96 and 10G ZRAM with 64kB
mTHP enabled shows better performance and a lower failure rate:

Before: sys time: 11609.69s  64kB/swpout: 1787051  64kB/swpout_fallback: 20917
After:  sys time: 5587.53s   64kB/swpout: 1811598  64kB/swpout_fallback: 0

System time is cut in half, and the failure rate drops to zero. Larger
allocations in a hybrid workload also showed a major improvement:

512kB swap failure rate:
Before: swpout:11663  swpout_fallback:1767
After:  swpout:14480  swpout_fallback:6

2M swap failure rate:
Before: swpout:24     swpout_fallback:1442
After:  swpout:1329   swpout_fallback:7

Kairui Song (3):
  mm, swap: only scan one cluster in fragment list
  mm, swap: remove fragment clusters counter
  mm, swap: prefer nonfull over free clusters

 include/linux/swap.h |  1 -
 mm/swapfile.c        | 68 +++++++++++++++++++++++---------------------
 2 files changed, 36 insertions(+), 33 deletions(-)

---

V1: https://lore.kernel.org/linux-mm/20250804172439.2331-1-ryncsn@gmail.com/
Changelog:
- Split into 3 patches, no code change [ Chris Li ]
- Rebase and rerun the test to see if removing the fragment cluster counter
  helps to improve the performance, as expected, it's marginal.
- Collect Ack/Review-by [ Nhat Pham, Chris Li ]

-- 
2.50.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ