lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4w2GqGj8HZMfwndsWu7qkORqsnaw9WwhmQS=pW4gR7nEA@mail.gmail.com>
Date: Wed, 17 Sep 2025 05:59:54 +0800
From: Barry Song <21cnbao@...il.com>
To: Kairui Song <ryncsn@...il.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>, 
	Matthew Wilcox <willy@...radead.org>, Hugh Dickins <hughd@...gle.com>, Chris Li <chrisl@...nel.org>, 
	Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, 
	Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>, 
	David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>, 
	linux-kernel@...r.kernel.org, Kairui Song <kasong@...cent.com>
Subject: Re: [PATCH v4 01/15] docs/mm: add document for swap table

On Wed, Sep 17, 2025 at 12:01 AM Kairui Song <ryncsn@...il.com> wrote:
>
> From: Chris Li <chrisl@...nel.org>
>
> Swap table is the new swap cache.
>
> Signed-off-by: Chris Li <chrisl@...nel.org>
> Signed-off-by: Kairui Song <kasong@...cent.com>
> ---
>  Documentation/mm/index.rst      |  1 +
>  Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++
>  MAINTAINERS                     |  1 +
>  3 files changed, 74 insertions(+)
>  create mode 100644 Documentation/mm/swap-table.rst
>
> diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
> index fb45acba16ac..828ad9b019b3 100644
> --- a/Documentation/mm/index.rst
> +++ b/Documentation/mm/index.rst
> @@ -57,6 +57,7 @@ documentation, or deleted if it has served its purpose.
>     page_table_check
>     remap_file_pages
>     split_page_table_lock
> +   swap-table
>     transhuge
>     unevictable-lru
>     vmalloced-kernel-stacks
> diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst
> new file mode 100644
> index 000000000000..acae6ceb4f7b
> --- /dev/null
> +++ b/Documentation/mm/swap-table.rst
> @@ -0,0 +1,72 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +:Author: Chris Li <chrisl@...nel.org>, Kairui Song <kasong@...cent.com>
> +
> +==========
> +Swap Table
> +==========
> +
> +Swap table implements swap cache as a per-cluster swap cache value array.
> +
> +Swap Entry
> +----------
> +
> +A swap entry contains the information required to serve the anonymous page
> +fault.
> +
> +Swap entry is encoded as two parts: swap type and swap offset.
> +
> +The swap type indicates which swap device to use.
> +The swap offset is the offset of the swap file to read the page data from.
> +
> +Swap Cache
> +----------
> +
> +Swap cache is a map to look up folios using swap entry as the key. The result
> +value can have three possible types depending on which stage of this swap entry
> +was in.
> +
> +1. NULL: This swap entry is not used.
> +
> +2. folio: A folio has been allocated and bound to this swap entry. This is
> +   the transient state of swap out or swap in. The folio data can be in
> +   the folio or swap file, or both.

This doesn’t look quite right.

the folio’s data must reside within the folio itself?
The data might also be in a swap file, or not.

> +
> +3. shadow: The shadow contains the working set information of the swapped
> +   out folio. This is the normal state for a swapped out page.
> +
> +Swap Table Internals
> +--------------------
> +
> +The previous swap cache is implemented by XArray. The XArray is a tree
> +structure. Each lookup will go through multiple nodes. Can we do better?
> +
> +Notice that most of the time when we look up the swap cache, we are either
> +in a swap in or swap out path. We should already have the swap cluster,
> +which contains the swap entry.
> +
> +If we have a per-cluster array to store swap cache value in the cluster.
> +Swap cache lookup within the cluster can be a very simple array lookup.
> +
> +We give such a per-cluster swap cache value array a name: the swap table.
> +
> +Each swap cluster contains 512 entries, so a swap table stores one cluster
> +worth of swap cache values, which is exactly one page. This is not
> +coincidental because the cluster size is determined by the huge page size.
> +The swap table is holding an array of pointers. The pointer has the same
> +size as the PTE. The size of the swap table should match to the second
> +last level of the page table page, exactly one page.

On a 32-bit system, I’m guessing the swap table is 2 KB, which is about
half of a page?

> +
> +With swap table, swap cache lookup can achieve great locality, simpler,
> +and faster.
> +

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ