lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wKWem-STYAnh_0EgSFKzzs1M1c7wz6K82wLt6T6JEw9A@mail.gmail.com>
Date: Wed, 17 Sep 2025 07:09:27 +0800
From: Barry Song <21cnbao@...il.com>
To: Chris Li <chrisl@...nel.org>
Cc: Kairui Song <ryncsn@...il.com>, linux-mm@...ck.org, 
	Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>, 
	Hugh Dickins <hughd@...gle.com>, Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, 
	Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>, 
	David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>, 
	linux-kernel@...r.kernel.org, Kairui Song <kasong@...cent.com>
Subject: Re: [PATCH v4 01/15] docs/mm: add document for swap table

On Wed, Sep 17, 2025 at 6:42 AM Chris Li <chrisl@...nel.org> wrote:
>
> On Tue, Sep 16, 2025 at 3:00 PM Barry Song <21cnbao@...il.com> wrote:
> >
> > On Wed, Sep 17, 2025 at 12:01 AM Kairui Song <ryncsn@...il.com> wrote:
> > >
> > > From: Chris Li <chrisl@...nel.org>
> > >
> > > Swap table is the new swap cache.
> > >
> > > Signed-off-by: Chris Li <chrisl@...nel.org>
> > > Signed-off-by: Kairui Song <kasong@...cent.com>
> > > ---
> > >  Documentation/mm/index.rst      |  1 +
> > >  Documentation/mm/swap-table.rst | 72 +++++++++++++++++++++++++++++++++
> > >  MAINTAINERS                     |  1 +
> > >  3 files changed, 74 insertions(+)
> > >  create mode 100644 Documentation/mm/swap-table.rst
> > >
> > > diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
> > > index fb45acba16ac..828ad9b019b3 100644
> > > --- a/Documentation/mm/index.rst
> > > +++ b/Documentation/mm/index.rst
> > > @@ -57,6 +57,7 @@ documentation, or deleted if it has served its purpose.
> > >     page_table_check
> > >     remap_file_pages
> > >     split_page_table_lock
> > > +   swap-table
> > >     transhuge
> > >     unevictable-lru
> > >     vmalloced-kernel-stacks
> > > diff --git a/Documentation/mm/swap-table.rst b/Documentation/mm/swap-table.rst
> > > new file mode 100644
> > > index 000000000000..acae6ceb4f7b
> > > --- /dev/null
> > > +++ b/Documentation/mm/swap-table.rst
> > > @@ -0,0 +1,72 @@
> > > +.. SPDX-License-Identifier: GPL-2.0
> > > +
> > > +:Author: Chris Li <chrisl@...nel.org>, Kairui Song <kasong@...cent.com>
> > > +
> > > +==========
> > > +Swap Table
> > > +==========
> > > +
> > > +Swap table implements swap cache as a per-cluster swap cache value array.
> > > +
> > > +Swap Entry
> > > +----------
> > > +
> > > +A swap entry contains the information required to serve the anonymous page
> > > +fault.
> > > +
> > > +Swap entry is encoded as two parts: swap type and swap offset.
> > > +
> > > +The swap type indicates which swap device to use.
> > > +The swap offset is the offset of the swap file to read the page data from.
> > > +
> > > +Swap Cache
> > > +----------
> > > +
> > > +Swap cache is a map to look up folios using swap entry as the key. The result
> > > +value can have three possible types depending on which stage of this swap entry
> > > +was in.
> > > +
> > > +1. NULL: This swap entry is not used.
> > > +
> > > +2. folio: A folio has been allocated and bound to this swap entry. This is
> > > +   the transient state of swap out or swap in. The folio data can be in
> > > +   the folio or swap file, or both.
> >
> > This doesn’t look quite right.
> >
> > the folio’s data must reside within the folio itself?
>
> For swap out cases that is true. The swap in case you allocate the
> folio first then read data from swap file to folio. There is a window
> swap file that has the data and folio does not.
>
> > The data might also be in a swap file, or not.
>
> The data only in swap file is covered by "data can be in the folio or
> swap file", it is an OR relationship.
>
> I think my previous statement still stands correct considering both
> swap out and swap in. Of course there is always room for improvement
> to make it more clear. But folio always has the data is not true for
> swap in. If you have other ways to improve it, please feel free to
> suggest.

I assume you’re referring to the swapin case where a folio has been
allocated and added to the swap cache, but it’s still being read and
hasn’t been updated yet?

I assume it could be something like:
The data may be in the folio or will be placed there later. It could
also reside in the swap file.

Alternatively, leave it unchanged.

>
>
> > On a 32-bit system, I’m guessing the swap table is 2 KB, which is about
> > half of a page?
>
> Yes, true. I consider that but decide to leave it out of the document.
> There are a lot of other implementation details the document does not
> cover, not just this aspect. This document provides a simple
> abstracted view (might not cover all the detail cases). One way to
> address that is add a qualification "on a 64 bit system". What do you
> say? I don't want to talk about the 32 bit system having half of a
> page in this document, I consider that too much detail. The 32 bit
> system is pretty rare nowadays.

I’d prefer that we remove all descriptions about matching PAGE_SIZE,
since we would need to double-check every case, like 16 KB or 64 KB pages.

For ARM64 with a 16 KB page size, the last-level index uses 24:14.
For ARM64 with a 64 KB page size, it uses 28:16[1]. For them, 512 entries
are not one PAGE.

[1] https://developer.arm.com/documentation/101811/0104/Translation-granule

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ