lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMgjq7AdauQ8=X0zeih2r21QoV=-WWj1hyBxLWRzq74n-C=-Ng@mail.gmail.com>
Date: Sun, 31 Aug 2025 23:54:32 +0800
From: Kairui Song <ryncsn@...il.com>
To: Chris Li <chrisl@...nel.org>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>, 
	Matthew Wilcox <willy@...radead.org>, Hugh Dickins <hughd@...gle.com>, Barry Song <baohua@...nel.org>, 
	Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, 
	Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>, 
	David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>, 
	linux-kernel@...r.kernel.org, kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH 7/9] mm, swap: remove contention workaround for swap cache

On Sat, Aug 30, 2025 at 11:24 PM Kairui Song <ryncsn@...il.com> wrote:
>
> On Sat, Aug 30, 2025 at 1:03 PM Chris Li <chrisl@...nel.org> wrote:
> >
> > Hi Kairui,
> >
> > It feels so good to remove that 64M swap cache space. Thank you for
> > making it happen.
> >
> > Some nitpick follows. I am fine as is as well.
> >
> > Acked-by: Chris Li <chrisl@...nel.org>
>
> Thanks.
>
> >
> > Chris
> >
> > On Fri, Aug 22, 2025 at 12:21 PM Kairui Song <ryncsn@...il.com> wrote:
> > >
> > > From: Kairui Song <kasong@...cent.com>
> > >
> > > Swap cluster setup will try to shuffle the clusters on initialization.
> > > It was helpful to avoid contention for the swap cache space. The cluster
> > > size (2M) was much smaller than each swap cache space (64M), so shuffling
> > > the cluster means the allocator will try to allocate swap slots that are
> > > in different swap cache spaces for each CPU, reducing the chance of two
> > > CPUs using the same swap cache space, and hence reducing the contention.
> > >
> > > Now, swap cache is managed by swap clusters, this shuffle is pointless.
> > > Just remove it, and clean up related macros.
> > >
> > > This should also improve the HDD swap performance as shuffling IO is a
> > > bad idea for HDD, and now the shuffling is gone.
> >
> > Did you have any numbers to prove that :-). Last time the swap
> > allocator stress testing has already destroyed two of my SAS drives
> > dedicated for testing. So I am not very keen on running the HDD swap
> > stress test. The HDD swap stress test are super slow to run, it takes
> > ages.
>
> I did some test months before, removing the cluster shuffle did help.
> I didn't test it again this time, only did some stress test. Doing
> performance test on HDD is really not a good experience as my HDD
> drives are too old so a long running test kills them easily.
>
> And I couldn't find any other factor that is causing a serial HDD IO
> regression, maybe the bot can help verify. If this doesn't help, we'll
> think of something else. But I don't think HDD based SWAP will ever
> have a practical good performance as they are terrible at rand read...
>
> Anyway, let me try again with HDD today, maybe I'll get some useful data.

So I tried to run some HDD test for many rounds, basically doing the
test in the URL below manually. Test is done using nr_task = 8. The
HDD swap partition size is 8G.

Do the preparation following:
https://github.com/intel/lkp-tests/blob/master/setup/swapin_setup
(Make usemem hold 8G memory and push them to swap)

And do the test with:
https://github.com/intel/lkp-tests/blob/master/programs/swapin/run
(Use SIGUSR1 to make usemem to read its memory and swapin)

Before this patch:
Test run 1:
1073741824 bytes / 878662493 usecs = 1193 KB/s
33019 usecs to free memory
1073741824 bytes / 891315681 usecs = 1176 KB/s
35144 usecs to free memory
1073741824 bytes / 898801090 usecs = 1166 KB/s
36305 usecs to free memory
1073741824 bytes / 925899753 usecs = 1132 KB/s
20498 usecs to free memory
1073741824 bytes / 927522592 usecs = 1130 KB/s
34397 usecs to free memory
1073741824 bytes / 928164994 usecs = 1129 KB/s
35908 usecs to free memory
1073741824 bytes / 929890294 usecs = 1127 KB/s
35014 usecs to free memory
1073741824 bytes / 929997808 usecs = 1127 KB/s
30491 usecs to free memory
test done

Test run 2:
1073741824 bytes / 771932432 usecs = 1358 KB/s
31194 usecs to free memory
1073741824 bytes / 788739551 usecs = 1329 KB/s
25714 usecs to free memory
1073741824 bytes / 795853979 usecs = 1317 KB/s
33809 usecs to free memory
1073741824 bytes / 798019211 usecs = 1313 KB/s
32019 usecs to free memory
1073741824 bytes / 798771141 usecs = 1312 KB/s
31689 usecs to free memory
1073741824 bytes / 800384757 usecs = 1310 KB/s
32622 usecs to free memory
1073741824 bytes / 800822764 usecs = 1309 KB/s
1073741824 bytes / 800882227 usecs = 1309 KB/s
32789 usecs to free memory
30577 usecs to free memory
test done

Test run 3:
1073741824 bytes / 775202370 usecs = 1352 KB/s
31832 usecs to free memory
1073741824 bytes / 777618372 usecs = 1348 KB/s
30172 usecs to free memory
1073741824 bytes / 778180006 usecs = 1347 KB/s
32482 usecs to free memory
1073741824 bytes / 778521023 usecs = 1346 KB/s
30188 usecs to free memory
1073741824 bytes / 779207791 usecs = 1345 KB/s
29364 usecs to free memory
1073741824 bytes / 780753200 usecs = 1343 KB/s
29860 usecs to free memory
1073741824 bytes / 781078362 usecs = 1342 KB/s
30449 usecs to free memory
1073741824 bytes / 781224993 usecs = 1342 KB/s
19557 usecs to free memory
test done


After this patch:
Test run 1:
1073741824 bytes / 569803736 usecs = 1840 KB/s
29032 usecs to free memory
1073741824 bytes / 573718349 usecs = 1827 KB/s
30399 usecs to free memory
1073741824 bytes / 592070142 usecs = 1771 KB/s
31896 usecs to free memory
1073741824 bytes / 593484694 usecs = 1766 KB/s
30650 usecs to free memory
1073741824 bytes / 596693866 usecs = 1757 KB/s
31582 usecs to free memory
1073741824 bytes / 597359263 usecs = 1755 KB/s
26436 usecs to free memory
1073741824 bytes / 598339187 usecs = 1752 KB/s
30697 usecs to free memory
1073741824 bytes / 598674138 usecs = 1751 KB/s
29791 usecs to free memory
test done

Test run 2:
1073741824 bytes / 578821803 usecs = 1811 KB/s
28433 usecs to free memory
1073741824 bytes / 584262760 usecs = 1794 KB/s
28565 usecs to free memory
1073741824 bytes / 586118970 usecs = 1789 KB/s
27365 usecs to free memory
1073741824 bytes / 589159154 usecs = 1779 KB/s
42645 usecs to free memory
1073741824 bytes / 593487980 usecs = 1766 KB/s
28684 usecs to free memory
1073741824 bytes / 606025290 usecs = 1730 KB/s
28974 usecs to free memory
1073741824 bytes / 607547362 usecs = 1725 KB/s
33221 usecs to free memory
1073741824 bytes / 607882511 usecs = 1724 KB/s
31393 usecs to free memory
test done

Test run 3:
1073741824 bytes / 487637856 usecs = 2150 KB/s
28022 usecs to free memory
1073741824 bytes / 491211037 usecs = 2134 KB/s
28229 usecs to free memory
1073741824 bytes / 527698561 usecs = 1987 KB/s
30265 usecs to free memory
1073741824 bytes / 531719920 usecs = 1972 KB/s
30373 usecs to free memory
1073741824 bytes / 532555758 usecs = 1968 KB/s
30019 usecs to free memory
1073741824 bytes / 532942789 usecs = 1967 KB/s
29354 usecs to free memory
1073741824 bytes / 540793872 usecs = 1938 KB/s
32703 usecs to free memory
1073741824 bytes / 541343777 usecs = 1936 KB/s
33428 usecs to free memory
test done

It seems to match the ~33% swapin.throughput regression reported by
the bot, it's about ~40% faster with this patch applied. I'll add this
test result to V2.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ