[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACePvbVq3kFtrue2smXRSZ86+EuNVf6q+awQnU-n7=Q4x7U9Lw@mail.gmail.com>
Date: Sun, 9 Nov 2025 23:32:09 -0800
From: Chris Li <chrisl@...nel.org>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Christian Borntraeger <borntraeger@...ux.ibm.com>, Janosch Frank <frankja@...ux.ibm.com>,
Claudio Imbrenda <imbrenda@...ux.ibm.com>, David Hildenbrand <david@...hat.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>, Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Peter Xu <peterx@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Arnd Bergmann <arnd@...db.de>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>, Nico Pache <npache@...hat.com>,
Ryan Roberts <ryan.roberts@....com>, Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
Lance Yang <lance.yang@...ux.dev>, Muchun Song <muchun.song@...ux.dev>,
Oscar Salvador <osalvador@...e.de>, Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Matthew Brost <matthew.brost@...el.com>, Joshua Hahn <joshua.hahnjy@...il.com>,
Rakie Kim <rakie.kim@...com>, Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
Ying Huang <ying.huang@...ux.alibaba.com>, Alistair Popple <apopple@...dia.com>,
Axel Rasmussen <axelrasmussen@...gle.com>, Yuanchu Xie <yuanchu@...gle.com>,
Wei Xu <weixugc@...gle.com>, Kemeng Shi <shikemeng@...weicloud.com>,
Kairui Song <kasong@...cent.com>, Nhat Pham <nphamcs@...il.com>, Baoquan He <bhe@...hat.com>,
SeongJae Park <sj@...nel.org>, Matthew Wilcox <willy@...radead.org>, Jason Gunthorpe <jgg@...pe.ca>,
Leon Romanovsky <leon@...nel.org>, Xu Xin <xu.xin16@....com.cn>,
Chengming Zhou <chengming.zhou@...ux.dev>, Jann Horn <jannh@...gle.com>,
Miaohe Lin <linmiaohe@...wei.com>, Naoya Horiguchi <nao.horiguchi@...il.com>,
Pedro Falcato <pfalcato@...e.de>, Pasha Tatashin <pasha.tatashin@...een.com>,
Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>, Hugh Dickins <hughd@...gle.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org, linux-s390@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org, linux-arch@...r.kernel.org,
damon@...ts.linux.dev
Subject: Re: [PATCH v2 00/16] mm: remove is_swap_[pte, pmd]() + non-swap
entries, introduce leaf entries
On Sat, Nov 8, 2025 at 9:09 AM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
>
> There's an established convention in the kernel that we treat leaf page
> tables (so far at the PTE, PMD level) as containing 'swap entries' should
> they be neither empty (i.e. p**_none() evaluating true) nor present
> (i.e. p**_present() evaluating true).
>
> However, at the same time we also have helper predicates - is_swap_pte(),
> is_swap_pmd() - which are inconsistently used.
>
> This is problematic, as it is logical to assume that should somebody wish
> to operate upon a page table swap entry they should first check to see if
> it is in fact one.
>
> It also implies that perhaps, in future, we might introduce a non-present,
> none page table entry that is not a swap entry.
>
> This series resolves this issue by systematically eliminating all use of
> the is_swap_pte() and is swap_pmd() predicates so we retain only the
> convention that should a leaf page table entry be neither none nor present
> it is a swap entry.
>
> We also have the further issue that 'swap entry' is unfortunately a really
> rather overloaded term and in fact refers to both entries for swap and for
> other information such as migration entries, page table markers, and device
> private entries.
>
> We therefore have the rather 'unique' concept of a 'non-swap' swap entry.
>
> This series therefore introduces the concept of 'software leaf entries', of
> type softleaf_t, to eliminate this confusion.
>
> A software leaf entry in this sense is any page table entry which is
> non-present, and represented by the softleaf_t type. That is - page table
> leaf entries which are software-controlled by the kernel.
>
> This includes 'none' or empty entries, which are simply represented by an
> zero leaf entry value.
>
> In order to maintain compatibility as we transition the kernel to this new
> type, we simply typedef swp_entry_t to softleaf_t.
Hi Lorenzo,
Sorry I was late to the party. Can you clarify that you intend to
remove swp_entry_t completely to softleaf_t?
I think for the traditional usage of the swp_entry_t, which is made up
of swap device type and swap device offset. Can we please keep the
swp_entry_t for the traditional swap system usage? The mix type can
stay in softleaf_t in the pte level.
I kind of wish the swap system could still use swp_entry_t. At least I
don't see any complete reason to massively rename all the swap system
code if we already know the entry is the limited meaning of swap entry
(device + offset).
Timing is not great either. We have the swap table phase II on review
now. There is also phase III and phase IV on the backlog pipeline. All
this renaming can create unnecessary conflicts. I am pleading please
reduce the renaming in the swap system code for now until we can
figure out what is the impact to the rest of the swap table series,
which is the heavy lifting for swap right now. I want to draw a line
in the sand that, on the PTE entry side, having multiple meanings, we
can call it softleaft_t whatever. If we know it is the traditional
swap entry meaning. Keep it swp_entry_t for now until we figure out
the real impact.
Does this renaming have any behavior change in the produced machine code?
Chris
>
> We introduce a number of predicates and helpers to interact with software
> leaf entries in include/linux/leafops.h which, as it imports swapops.h, can
> be treated as a drop-in replacement for swapops.h wherever leaf entry
> helpers are used.
>
> Since softleaf_from_[pte, pmd]() treats present entries as they were
> empty/none leaf entries, this allows for a great deal of simplification of
> code throughout the code base, which this series utilises a great deal.
>
> We additionally change from swap entry to software leaf entry handling
> where it makes sense to and eliminate functions from swapops.h where
> software leaf entries obviate the need for the functions.
>
>
> v2:
> * Folded all fixpatches into patches they fix.
> * Added Vlasta's tag to patch 1 (thanks!)
> * Renamed leaf_entry_t to softleaf_t and leafent_xxx() to softleaf_xxx() as
> a result of discussion between Matthew, Jason, David, Gregory & myself to
> make clearer that we abstract the concept of a software page table leaf
> entry.
> * Updated all commit messages to reference softleaves.
> * Updated the kdoc comment describing softleaf_t to provide more detail.
> * Added a description of softleaves to the top of leafops.h.
>
> non-RFC v1:
> * As part of efforts to eliminate swp_entry_t usage, remove
> pte_none_mostly() and correct UFFD PTE marker handling.
> * Introduce leaf_entry_t - credit to Gregory for naming, and to Jason for
> the concept of simply using a leafent_*() set of functions to interact
> with these entities.
> * Replace pte_to_swp_entry_or_zero() with leafent_from_pte() and simply
> categorise pte_none() cases as an empty leaf entry, as per Jason.
> * Eliminate get_pte_swap_entry() - as we can simply do this with
> leafent_from_pte() also, as discussed with Jason.
> * Put pmd_trans_huge_lock() acquisition/release in pagemap_pmd_range()
> rather than pmd_trans_huge_lock_thp() as per Gregory.
> * Eliminate pmd_to_swp_entry() and related and introduce leafent_from_pmd()
> to replace it and further propagate leaf entry usage.
> * Remove the confusing and unnecessary is_hugetlb_entry_[migration,
> hwpoison]() functions.
> * Replace is_pfn_swap_entry(), pfn_swap_entry_to_page(),
> is_writable_device_private_entry(), is_device_exclusive_entry(),
> is_migration_entry(), is_writable_migration_entry(),
> is_readable_migration_entry(), is_readable_exclusive_migration_entry()
> and pfn_swap_entry_folio() with leafent equivalents.
> * Wrapped up the 'safe' behaviour discussed with Jason in
> leafent_from_[pte, pmd]() so these can be used unconditionally which
> simplifies things a lot.
> * Further changes that are a consequence of the introduction of leaf
> entries.
> https://lore.kernel.org/all/cover.1762171281.git.lorenzo.stoakes@oracle.com/
>
> RFC:
> https://lore.kernel.org/all/cover.1761288179.git.lorenzo.stoakes@oracle.com/
>
> Lorenzo Stoakes (16):
> mm: correctly handle UFFD PTE markers
> mm: introduce leaf entry type and use to simplify leaf entry logic
> mm: avoid unnecessary uses of is_swap_pte()
> mm: eliminate is_swap_pte() when softleaf_from_pte() suffices
> mm: use leaf entries in debug pgtable + remove is_swap_pte()
> fs/proc/task_mmu: refactor pagemap_pmd_range()
> mm: avoid unnecessary use of is_swap_pmd()
> mm/huge_memory: refactor copy_huge_pmd() non-present logic
> mm/huge_memory: refactor change_huge_pmd() non-present logic
> mm: replace pmd_to_swp_entry() with softleaf_from_pmd()
> mm: introduce pmd_is_huge() and use where appropriate
> mm: remove remaining is_swap_pmd() users and is_swap_pmd()
> mm: remove non_swap_entry() and use softleaf helpers instead
> mm: remove is_hugetlb_entry_[migration, hwpoisoned]()
> mm: eliminate further swapops predicates
> mm: replace remaining pte_to_swp_entry() with softleaf_from_pte()
>
> MAINTAINERS | 1 +
> arch/s390/mm/gmap_helpers.c | 20 +-
> arch/s390/mm/pgtable.c | 12 +-
> fs/proc/task_mmu.c | 294 +++++++++-------
> fs/userfaultfd.c | 85 ++---
> include/asm-generic/hugetlb.h | 8 -
> include/linux/huge_mm.h | 48 ++-
> include/linux/hugetlb.h | 2 -
> include/linux/leafops.h | 620 ++++++++++++++++++++++++++++++++++
> include/linux/migrate.h | 2 +-
> include/linux/mm_inline.h | 6 +-
> include/linux/mm_types.h | 25 ++
> include/linux/swapops.h | 273 +--------------
> include/linux/userfaultfd_k.h | 33 +-
> mm/damon/ops-common.c | 6 +-
> mm/debug_vm_pgtable.c | 86 +++--
> mm/filemap.c | 8 +-
> mm/hmm.c | 36 +-
> mm/huge_memory.c | 263 +++++++-------
> mm/hugetlb.c | 165 ++++-----
> mm/internal.h | 20 +-
> mm/khugepaged.c | 33 +-
> mm/ksm.c | 6 +-
> mm/madvise.c | 28 +-
> mm/memory-failure.c | 8 +-
> mm/memory.c | 150 ++++----
> mm/mempolicy.c | 25 +-
> mm/migrate.c | 45 +--
> mm/migrate_device.c | 24 +-
> mm/mincore.c | 25 +-
> mm/mprotect.c | 59 ++--
> mm/mremap.c | 13 +-
> mm/page_table_check.c | 33 +-
> mm/page_vma_mapped.c | 65 ++--
> mm/pagewalk.c | 15 +-
> mm/rmap.c | 17 +-
> mm/shmem.c | 7 +-
> mm/swap_state.c | 12 +-
> mm/swapfile.c | 14 +-
> mm/userfaultfd.c | 53 +--
> 40 files changed, 1560 insertions(+), 1085 deletions(-)
> create mode 100644 include/linux/leafops.h
>
> --
> 2.51.0
Powered by blists - more mailing lists