[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF8kJuNinT1sZG1edUDMXbdCJ8W_enDcnviAuj5=MViNZ1AczQ@mail.gmail.com>
Date: Tue, 26 Aug 2025 15:00:18 -0700
From: Chris Li <chrisl@...nel.org>
To: Kairui Song <kasong@...cent.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>, Hugh Dickins <hughd@...gle.com>, Barry Song <baohua@...nel.org>,
Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>,
Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>,
David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/9] mm, swap: introduce swap table as swap cache (phase I)
On Fri, Aug 22, 2025 at 12:20 PM Kairui Song <ryncsn@...il.com> wrote:
>
> From: Kairui Song <kasong@...cent.com>
>
> This is the first phase of the bigger series implementing basic
> infrastructures for the Swap Table idea proposed at the LSF/MM/BPF
> topic "Integrate swap cache, swap maps with swap allocator" [1].
>
> This phase I contains 9 patches, introduces the swap table infrastructure
> and uses it as the swap cache backend. By doing so, we have up to ~5-20%
> performance gain in throughput, RPS or build time for benchmark and
> workload tests. This is based on Chris Li's idea of using cluster size
> atomic arrays to implement swap cache. It has less contention on the swap
> cache access. The cluster size is much finer-grained than the 64M address
> space split, which is removed in this phase I. It also unifies and cleans
> up the swap code base.
Thanks for making this happen. It has gone a long way from my early
messy experimental patches on replacing xarray in swap caches. Beating
the original swap_map in terms of memory usage is particularly hard. I
once received this feedback from Matthew that whoever wants to replace
the swap cache is asking for a lot of pain and suffering. He is
absolutely right.
I am so glad that we are finally seeing the light of the other end of
the tunnel. We are close to a state that is able to beat the original
swap layer both in terms of memory usage and CPU performance.
Just to recap. The current swap layer per slot memory usage is 3 + 8
bytes. 3 up front static, 1 from swap map, 2 from swap cgroup. The 8
byte dynamic allocations are from the xarray of swap cache.
At the end of this full series (27+ patches) we can completely get rid
of the 3 up front allocation. Only dynamic allocate 8 byte per slot
entry. That is a straight win in terms of memory allocation, no
compromise was made there.
The reason we can beat the previous CPU usage is that each cluster has
512 entries. Much smaller than the 64M xarray tree. The cluster lock
is a much smaller lock than the xarray tree lock. We can do lockless
atomic lookup on the swap cache that is pretty cool as well.
I will do one more review pass on this series again soon.
Very exciting.
Chris
Powered by blists - more mailing lists