[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260126-swap-table-p3-v1-0-a74155fab9b0@tencent.com>
Date: Mon, 26 Jan 2026 01:57:23 +0800
From: Kairui Song <ryncsn@...il.com>
To: linux-mm@...ck.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Kemeng Shi <shikemeng@...weicloud.com>, Nhat Pham <nphamcs@...il.com>,
Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, David Hildenbrand <david@...nel.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, linux-kernel@...r.kernel.org,
Chris Li <chrisl@...nel.org>, Kairui Song <kasong@...cent.com>
Subject: [PATCH 00/12] mm, swap: swap table phase III: remove swap_map
This series is based on phase II which is still in mm-unstable.
This series removes the static swap_map and uses the swap table for the
swap count directly. This saves about ~30% memory usage for the static
swap metadata. For example, this saves 256MB of memory when mounting a
1TB swap device. Performance is slightly better too, since the double
update of the swap table and swap_map is now gone.
Test results:
Mounting a swap device:
=======================
Mount a 1TB brd device as SWAP, just to verify the memory save:
`free -m` before:
total used free shared buff/cache available
Mem: 1465 1051 417 1 61 413
Swap: 1054435 0 1054435
`free -m` after:
total used free shared buff/cache available
Mem: 1465 795 672 1 62 670
Swap: 1054435 0 1054435
Idle memory usage is reduced by ~256MB just as expected. And following
this design we should be able to save another ~512MB in a next phase.
Build kernel test:
==================
Test using ZSWAP with NVME SWAP, make -j48, defconfig, in a x86_64 VM
with 5G RAM, under global pressure, avg of 32 test run:
Before After:
System time: 1038.97s 1013.75s (-2.4%)
Test using ZRAM as SWAP, make -j12, tinyconfig, in a ARM64 VM with 1.5G
RAM, under global pressure, avg of 32 test run:
Before After:
System time: 67.75s 66.65s (-1.6%)
The result is slightly better.
Redis / Valkey benchmark:
=========================
Test using ZRAM as SWAP, in a ARM64 VM with 1.5G RAM, under global pressure,
avg of 64 test run:
Server: valkey-server --maxmemory 2560M
Client: redis-benchmark -r 3000000 -n 3000000 -d 1024 -c 12 -P 32 -t get
no persistence with BGSAVE
Before: 472705.71 RPS 369451.68 RPS
After: 481197.93 RPS (+1.8%) 374922.32 RPS (+1.5%)
In conclusion, performance is better in all cases, and memory usage is
much lower.
The swap cgroup array will also be merged into the swap table in a later
phase, saving the other ~60% part of the static swap metadata and making
all the swap metadata dynamic. The improved API for swap operations also
reduces the lock contention and makes more batching operations possible.
Suggested-by: Chris Li <chrisl@...nel.org>
Signed-off-by: Kairui Song <kasong@...cent.com>
---
Kairui Song (12):
mm, swap: protect si->swap_file properly and use as a mount indicator
mm, swap: clean up swapon process and locking
mm, swap: remove redundant arguments and locking for enabling a device
mm, swap: consolidate bad slots setup and make it more robust
mm/workingset: leave highest bits empty for anon shadow
mm, swap: implement helpers for reserving data in the swap table
mm, swap: mark bad slots in swap table directly
mm, swap: simplify swap table sanity range check
mm, swap: use the swap table to track the swap count
mm, swap: no need to truncate the scan border
mm, swap: simplify checking if a folio is swapped
mm, swap: no need to clear the shadow explicitly
include/linux/swap.h | 28 +-
mm/memory.c | 2 +-
mm/swap.h | 20 +-
mm/swap_state.c | 72 ++--
mm/swap_table.h | 131 +++++-
mm/swapfile.c | 1104 +++++++++++++++++++++-----------------------------
mm/workingset.c | 49 ++-
7 files changed, 653 insertions(+), 753 deletions(-)
---
base-commit: 10de4550639e9df9242e32e9affc90ed75a27c7d
change-id: 20251216-swap-table-p3-8de73fee7b5f
Best regards,
--
Kairui Song <kasong@...cent.com>
Powered by blists - more mailing lists