linux-kernel - [PATCH v3 0/2] mm/zswap: optimize for dynamic zswap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240210-zswap-global-lru-v3-0-200495333595@bytedance.com>
Date: Fri, 16 Feb 2024 08:55:03 +0000
From: Chengming Zhou <zhouchengming@...edance.com>
To: Johannes Weiner <hannes@...xchg.org>, Yosry Ahmed <yosryahmed@...gle.com>, Nhat Pham <nphamcs@...il.com>,
 Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, Yosry Ahmed <yosryahmed@...gle.com>, linux-kernel@...r.kernel.org,
 Chengming Zhou <zhouchengming@...edance.com>
Subject: [PATCH v3 0/2] mm/zswap: optimize for dynamic zswap_pools

Changes in v3:
- Improve the commit messages and comments, per Yosry.
- Use percpu_ref_is_zero() for debug purpose, per Yosry.
- Collect tag.
- Link to v2: https://lore.kernel.org/r/20240210-zswap-global-lru-v2-0-fbee3b11a62e@bytedance.com

Changes in v2:
- fix build error when !CONFIG_MEMCG_KMEM.
- make zswap struct static and fix some error paths, per Yosry.
- add another shrink_lock to protect zswap.next_shrink, per Yosry.
- keep "WARN_ON(percpu_ref_tryget(&pool->ref))" in pool release path
  for debug, per Nhat.
- improve the commit messages.
- Link to v1: https://lore.kernel.org/r/20240210-zswap-global-lru-v1-0-853473d7b0da@bytedance.com

Dynamic pool creation has been supported for a long time, which maybe
not used so much in practice. But with the per-memcg lru merged, the
current structure of zswap_pool's lru and shrinker become less optimal.

In the current structure, each zswap_pool has its own lru, shrinker and
shrink_work, but only the latest zswap_pool will be the current used.

1. When memory has pressure, all shrinkers of zswap_pools will try to
   shrink its lru list, there is no order between them.

2. When zswap limit hit, only the last zswap_pool's shrink_work will
   try to shrink its own lru, which is inefficient.

A more natural way is to have a global zswap lru shared between all
zswap_pools, and so is the shrinker. The code becomes much simpler too.

Another optimization is changing zswap_pool kref to percpu_ref, which
will be taken reference by every zswap entry. So the scalability is
better.

Testing kernel build (32 threads) in tmpfs with memory.max=2GB.
(zswap shrinker and writeback enabled with one 50GB swapfile,
on a 128 CPUs x86-64 machine, below is the average of 5 runs)

        mm-unstable  zswap-global-lru
real    63.20        63.12
user    1061.75      1062.95
sys     268.74       264.44

Signed-off-by: Chengming Zhou <zhouchengming@...edance.com>
---
Chengming Zhou (2):
      mm/zswap: global lru and shrinker shared by all zswap_pools
      mm/zswap: change zswap_pool kref to percpu_ref

 mm/zswap.c | 207 +++++++++++++++++++++++++++----------------------------------
 1 file changed, 93 insertions(+), 114 deletions(-)
---
base-commit: 191d97734e41a5c9f90a2f6636fdd335ae1d435d
change-id: 20240210-zswap-global-lru-94d49316178b

Best regards,
-- 
Chengming Zhou <zhouchengming@...edance.com>