lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aL7NrhGw5ftOXUZs@MiWiFi-R3L-srv>
Date: Mon, 8 Sep 2025 20:35:58 +0800
From: Baoquan He <bhe@...hat.com>
To: Kairui Song <kasong@...cent.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	Matthew Wilcox <willy@...radead.org>,
	Hugh Dickins <hughd@...gle.com>, Chris Li <chrisl@...nel.org>,
	Barry Song <baohua@...nel.org>, Nhat Pham <nphamcs@...il.com>,
	Kemeng Shi <shikemeng@...weicloud.com>,
	Baolin Wang <baolin.wang@...ux.alibaba.com>,
	Ying Huang <ying.huang@...ux.alibaba.com>,
	Johannes Weiner <hannes@...xchg.org>,
	David Hildenbrand <david@...hat.com>,
	Yosry Ahmed <yosryahmed@...gle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Zi Yan <ziy@...dia.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 01/15] docs/mm: add document for swap table

On 09/06/25 at 03:13am, Kairui Song wrote:
> From: Kairui Song <kasong@...cent.com>
> 
> From: Chris Li <chrisl@...nel.org>

'From author <authorkernel.org>' can only be one person, and the co-author
should be specified by "Co-developed-by:" and "Signed-off-by:"?

> 
> Swap table is the new swap cache.
> 
> Signed-off-by: Chris Li <chrisl@...nel.org>
> Signed-off-by: Kairui Song <kasong@...cent.com>
> ---
>  Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++
>  MAINTAINERS                     |  1 +
>  2 files changed, 73 insertions(+)
>  create mode 100644 Documentation/mm/swap-table.rst
> 
> diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst
> new file mode 100644
> index 000000000000..929cd91aa984
> --- /dev/null
> +++ b/Documentation/mm/swap-table.rst
> @@ -0,0 +1,72 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +:Author: Chris Li <chrisl@...nel.org>, Kairui Song <kasong@...cent.com>
> +
> +==========
> +Swap Table
> +==========
> +
> +Swap table implements swap cache as a per-cluster swap cache value array.
> +
> +Swap Entry
> +----------
> +
> +A swap entry contains the information required to serve the anonymous page
> +fault.
> +
> +Swap entry is encoded as two parts: swap type and swap offset.
> +
> +The swap type indicates which swap device to use.
> +The swap offset is the offset of the swap file to read the page data from.
> +
> +Swap Cache
> +----------
> +
> +Swap cache is a map to look up folios using swap entry as the key. The result
> +value can have three possible types depending on which stage of this swap entry
> +was in.
> +
> +1. NULL: This swap entry is not used.
> +
> +2. folio: A folio has been allocated and bound to this swap entry. This is
> +   the transient state of swap out or swap in. The folio data can be in
> +   the folio or swap file, or both.
> +
> +3. shadow: The shadow contains the working set information of the swap
> +   outed folio. This is the normal state for a swap outed page.
> +
> +Swap Table
> +----------
> +
> +The previous swap cache is implemented by XAray. The XArray is a tree
> +structure. Each lookup will go through multiple nodes. Can we do better?
> +
> +Notice that most of the time when we look up the swap cache, we are either
> +in a swap in or swap out path. We should already have the swap cluster,
> +which contains the swap entry.
> +
> +If we have a per-cluster array to store swap cache value in the cluster.
> +Swap cache lookup within the cluster can be a very simple array lookup.
> +
> +We give such a per-cluster swap cache value array a name: the swap table.
> +
> +Each swap cluster contains 512 entries, so a swap table stores one cluster
> +worth of swap cache values, which is exactly one page. This is not
> +coincidental because the cluster size is determined by the huge page size.
> +The swap table is holding an array of pointers. The pointer has the same
> +size as the PTE. The size of the swap table should match to the second
> +last level of the page table page, exactly one page.
> +
> +With swap table, swap cache lookup can achieve great locality, simpler,
> +and faster.
> +
> +Locking
> +-------
> +
> +Swap table modification requires taking the cluster lock. If a folio
> +is being added to or removed from the swap table, the folio must be
> +locked prior to the cluster lock. After adding or removing is done, the
> +folio shall be unlocked.
> +
> +Swap table lookup is protected by RCU and atomic read. If the lookup
> +returns a folio, the user must lock the folio before use.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ec19be6c9917..1c8292c0318d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16219,6 +16219,7 @@ R:	Barry Song <baohua@...nel.org>
>  R:	Chris Li <chrisl@...nel.org>
>  L:	linux-mm@...ck.org
>  S:	Maintained
> +F:	Documentation/mm/swap-table.rst
>  F:	include/linux/swap.h
>  F:	include/linux/swapfile.h
>  F:	include/linux/swapops.h
> -- 
> 2.51.0
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ