lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240711-swap-allocator-v4-0-0295a4d4c7aa@kernel.org>
Date: Thu, 11 Jul 2024 00:29:04 -0700
From: Chris Li <chrisl@...nel.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Kairui Song <kasong@...cent.com>, Hugh Dickins <hughd@...gle.com>, 
 Ryan Roberts <ryan.roberts@....com>, "Huang, Ying" <ying.huang@...el.com>, 
 Kalesh Singh <kaleshsingh@...gle.com>, linux-kernel@...r.kernel.org, 
 linux-mm@...ck.org, Chris Li <chrisl@...nel.org>, 
 Barry Song <baohua@...nel.org>
Subject: [PATCH v4 0/3] mm: swap: mTHP swap allocator base on swap cluster
 order

This is the short term solutions "swap cluster order" listed
in my "Swap Abstraction" discussion slice 8 in the recent
LSF/MM conference.

When commit 845982eb264bc "mm: swap: allow storage of all mTHP
orders" is introduced, it only allocates the mTHP swap entries
from the new empty cluster list.  It has a fragmentation issue
reported by Barry.

https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@mail.gmail.com/

The reason is that all the empty clusters have been exhausted while
there are plenty of free swap entries in the cluster that are
not 100% free.

Remember the swap allocation order in the cluster.
Keep track of the per order non full cluster list for later allocation.

The patch 3 of this series gives the swap SSD allocation
a new separate code path from the HDD allocation. The new allocator
use cluster list only and do not global scan swap_map[] without lock
any more.

This streamline the swap allocation for SSD. The code matches the execution
flow much better.

User impact: For users that allocate and free mix order mTHP swapping,
It greatly improves the success rate of the mTHP swap allocation after the
initial phase.

It also performs faster when the swapfile is close to full, because the
allocator can get the non full cluster from a list rather than scanning
a lot of swap_map entries. 

This series still lacks the swap cache reclaim feature. The reclaim series
of patches are under development and testing right now. Will post the
mail list soon. For this reason, the patch 3 is consider RFC and not
ready to merge.

With Barry's mthp test program V2:

Without:
$ ./thp_swap_allocator_test -a
Iteration 1: swpout inc: 32, swpout fallback inc: 192, Fallback percentage: 85.71%
Iteration 2: swpout inc: 0, swpout fallback inc: 231, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 227, Fallback percentage: 100.00%
...
Iteration 98: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 215, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%

$ ./thp_swap_allocator_test -a -s
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%

$ ./thp_swap_allocator_test -s
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%

$ ./thp_swap_allocator_test
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%

With:
$ ./thp_swap_allocator_test -a
Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 3: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
...
Iteration 98: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 99: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 100: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%

$ ./thp_swap_allocator_test -a -s
Iteration 1: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 3: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 4: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 5: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 6: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 7: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 8: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 9: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 10: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 11: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 12: swpout inc: 232, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 13: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 14: swpout inc: 223, swpout fallback inc: 3, Fallback percentage: 1.33%
Iteration 15: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 16: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 17: swpout inc: 212, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 18: swpout inc: 234, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 19: swpout inc: 220, swpout fallback inc: 6, Fallback percentage: 2.65%
Iteration 20: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 21: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 22: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 23: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 24: swpout inc: 232, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 25: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 26: swpout inc: 230, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 27: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 28: swpout inc: 225, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 29: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 30: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 31: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 32: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 33: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 34: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 35: swpout inc: 230, swpout fallback inc: 3, Fallback percentage: 1.29%
Iteration 36: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 37: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 38: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 39: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 40: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 41: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 42: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 43: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 44: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 45: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 46: swpout inc: 221, swpout fallback inc: 2, Fallback percentage: 0.90%
Iteration 47: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 48: swpout inc: 220, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 49: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 50: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 51: swpout inc: 224, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 52: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 53: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 54: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 55: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 56: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 57: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 58: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 59: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 60: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 61: swpout inc: 217, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 62: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 63: swpout inc: 223, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 64: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 65: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 66: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 67: swpout inc: 220, swpout fallback inc: 2, Fallback percentage: 0.90%
Iteration 68: swpout inc: 224, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 69: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 70: swpout inc: 219, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 71: swpout inc: 225, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 72: swpout inc: 231, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 73: swpout inc: 218, swpout fallback inc: 5, Fallback percentage: 2.24%
Iteration 74: swpout inc: 223, swpout fallback inc: 5, Fallback percentage: 2.19%
Iteration 75: swpout inc: 222, swpout fallback inc: 7, Fallback percentage: 3.06%
Iteration 76: swpout inc: 226, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 77: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 78: swpout inc: 215, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 79: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89%
Iteration 80: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 81: swpout inc: 218, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 82: swpout inc: 228, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 83: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 84: swpout inc: 222, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 85: swpout inc: 213, swpout fallback inc: 1, Fallback percentage: 0.47%
Iteration 86: swpout inc: 215, swpout fallback inc: 8, Fallback percentage: 3.59%
Iteration 87: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 88: swpout inc: 227, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 89: swpout inc: 222, swpout fallback inc: 6, Fallback percentage: 2.63%
Iteration 90: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 91: swpout inc: 214, swpout fallback inc: 1, Fallback percentage: 0.47%
Iteration 92: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 93: swpout inc: 221, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 94: swpout inc: 223, swpout fallback inc: 2, Fallback percentage: 0.89%
Iteration 95: swpout inc: 222, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 96: swpout inc: 223, swpout fallback inc: 4, Fallback percentage: 1.76%
Iteration 97: swpout inc: 223, swpout fallback inc: 7, Fallback percentage: 3.04%
Iteration 98: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 99: swpout inc: 229, swpout fallback inc: 1, Fallback percentage: 0.43%
Iteration 100: swpout inc: 229, swpout fallback inc: 0, Fallback percentage: 0.00%

$ ./thp_swap_allocator_test      
Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 134, swpout fallback inc: 98, Fallback percentage: 42.24%
Iteration 3: swpout inc: 72, swpout fallback inc: 154, Fallback percentage: 68.14%
Iteration 4: swpout inc: 40, swpout fallback inc: 183, Fallback percentage: 82.06%
Iteration 5: swpout inc: 27, swpout fallback inc: 199, Fallback percentage: 88.05%
Iteration 6: swpout inc: 22, swpout fallback inc: 202, Fallback percentage: 90.18%
Iteration 7: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74%
Iteration 8: swpout inc: 14, swpout fallback inc: 214, Fallback percentage: 93.86%
Iteration 9: swpout inc: 5, swpout fallback inc: 221, Fallback percentage: 97.79%
Iteration 10: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
...
Iteration 97: swpout inc: 12, swpout fallback inc: 207, Fallback percentage: 94.52%
Iteration 98: swpout inc: 8, swpout fallback inc: 219, Fallback percentage: 96.48%
Iteration 99: swpout inc: 16, swpout fallback inc: 218, Fallback percentage: 93.16%
Iteration 100: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%

$ ./thp_swap_allocator_test -s
Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 84, swpout fallback inc: 148, Fallback percentage: 63.79%
Iteration 3: swpout inc: 39, swpout fallback inc: 195, Fallback percentage: 83.33%
Iteration 4: swpout inc: 16, swpout fallback inc: 217, Fallback percentage: 93.13%
Iteration 5: swpout inc: 11, swpout fallback inc: 214, Fallback percentage: 95.11%
Iteration 6: swpout inc: 10, swpout fallback inc: 218, Fallback percentage: 95.61%
...
Iteration 96: swpout inc: 5, swpout fallback inc: 225, Fallback percentage: 97.83%
Iteration 97: swpout inc: 2, swpout fallback inc: 215, Fallback percentage: 99.08%
Iteration 98: swpout inc: 2, swpout fallback inc: 220, Fallback percentage: 99.10%
Iteration 99: swpout inc: 4, swpout fallback inc: 222, Fallback percentage: 98.23%
Iteration 100: swpout inc: 3, swpout fallback inc: 221, Fallback percentage: 98.66%

Kernel compile under tmpfs with cgroup memory.max = 2G.
12 core 24 hyperthreading, 32 jobs.

HDD swap 3 runs average, 20G swap file:

Without:
user	4186.290
system	421.743
real	597.317

With:
user	4113.897
system	413.123
real	659.543

SSD swap 10 runs average, 20G swap partition:

Without:
user	4736.810
system	500.921
real	250.243
 
With:
user	4729.478
system	500.265
real	249.633

Two zram swap:
zram0 1.4G zram1 20G.
The idea is forcing the zram0 almost
full then overflow to zram1:

Two zram 10 runs average:

Without:
user	4600.693
system	384.105
real	238.735

With:
user	4604.502
system	382.087
real	239.063

Reported-by: Barry Song <21cnbao@...il.com>
Signed-off-by: Chris Li <chrisl@...nel.org>
---
Changes in v4:
- Remove a warning in patch 2.
- Allocating from the free cluster list before the nonfull list. Revert the v3 behavior.
- Add cluster_index and cluster_offset function.
- Patch 3 has a new allocating path for SSD.
- HDD swap allocation does not need to consider clusters any more.

Changes in v3:
- Using V1 as base.
- Rename "next" to "list" for the list field, suggested by Ying.
- Update comment for the locking rules for cluster fields and list,
  suggested by Ying.
- Allocate from the nonfull list before attempting free list, suggested
  by Kairui.
- Link to v2: https://lore.kernel.org/r/20240614-swap-allocator-v2-0-2a513b4a7f2f@kernel.org

Changes in v2:
- Abandoned.
- Link to v1: https://lore.kernel.org/r/20240524-swap-allocator-v1-0-47861b423b26@kernel.org

---
Chris Li (3):
      mm: swap: swap cluster switch to double link list
      mm: swap: mTHP allocate swap entries from nonfull list
      RFC: mm: swap: seperate SSD allocation from scan_swap_map_slots()

 include/linux/swap.h |  30 ++--
 mm/swapfile.c        | 490 +++++++++++++++++++++++----------------------------
 2 files changed, 238 insertions(+), 282 deletions(-)
---
base-commit: ff3a648ecb9409aff1448cf4f6aa41d78c69a3bc
change-id: 20240523-swap-allocator-1534c480ece4

Best regards,
-- 
Chris Li <chrisl@...nel.org>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ