[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANeU7QmcC=-CTmJ7i8R77SQ_WArBvjP3VrmpLOy-b7QhCfMRYA@mail.gmail.com>
Date: Thu, 18 Sep 2025 00:03:20 -0700
From: Chris Li <chrisl@...nel.org>
To: Barry Song <21cnbao@...il.com>
Cc: Kairui Song <ryncsn@...il.com>, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>,
Hugh Dickins <hughd@...gle.com>, Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>,
Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>,
Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>,
David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>,
linux-kernel@...r.kernel.org, Kairui Song <kasong@...cent.com>
Subject: Re: [PATCH v4 01/15] docs/mm: add document for swap table
Hi Barry,
How about this:
A swap table stores one cluster worth of swap cache values, which is
exactly one page table page on most morden 64 bit systems. This is not
coincidental because the cluster size is determined by the huge page size.
The swap table is holding an array of pointers, which have the same
size as the PTE. The size of the swap table should match the page table
page.
If that sounds OK, I will send an incremental patch to Andrew.
Chris
On Wed, Sep 17, 2025 at 10:03 PM Chris Li <chrisl@...nel.org> wrote:
>
> On Wed, Sep 17, 2025 at 4:38 PM Barry Song <21cnbao@...il.com> wrote:
> >
> > > > This approach still seems to work, so the 32-bit system appears to be
> > > > the only exception. However, I’m not entirely sure that your description
> > > > of “the second last level” is correct. I believe it refers to the PTE,
> > > > which corresponds to the last level, not the second-to-last.
> > > > In other words, how do you define the second-to-last level page table?
> > >
> > > The second-to-last level page table page holds the PMD. The last level
> > > page table holds PTE.
> > > Cluster size is HPAGE_PMD_NR = 1<<HPAGE_PMD_ORDER
> > > I was thinking of a PMD entry but the actual page table page it points
> > > to is the last level.
> > > That is a good catch. Let me see how to fix it.
> > >
> > > What I am trying to say is that, swap table size should match to the
> > > PTE page table page size which determines the cluster size. An
> > > alternative to understanding the swap table is that swap table is a
> > > shadow PTE page table containing the shadow PTE matching to the page
> > > that gets swapped out to the swapfile. It is arranged in the swapfile
> > > swap offset order. The intuition is simple once you find the right
> > > angle to view it. However it might be a mouthful to explain.
> > >
> > > I am fine with removing it, on the other hand it removes the only bit
> > > of secret sauce which I try to give the reader a glimpse of my
> > > intuition of the swap table.
> >
> > Perhaps you could describe the swap table as similar to a PTE page table
> > representing the swap cache mapping.
>
> Hard to qualify what is "similar", in what way it is similar.
> Different readers will have different interpretations of what similar
> means to them.
>
> > That is correct for most 32-bit and 64-bit systems,
> > but not for every machine.
>
> I think I will leave it as for most 64 bit systems, the swap table
> size is exactly one page table page size and that is not coincidental.
>
> > The only exception is a 32-bit system with a 64-bit physical address
> > (Large Physical Address Extension, LPAE), which uses a 4 KB PTE table
> > but a 2 KB swap table because the pointer is 32 bit while each page
> > table entry is 64 bit.
>
> I feel that is a very corner case. I will leave it out of the
> document. I want to present a simplified abstracted view. There is
> always more detail to distract the simple abstracted view. That is why
> we have physics.
>
> > Maybe we can simply say that the number of entries in the swap table
> > is the same as in a PTE page table?
>
> Yes, that is what I want to say, for most modern 64 bit systems.
>
> Chris
Powered by blists - more mailing lists