linux-kernel - Re: Calls to do_swap_page() from handle_pte_fault() on a system where swap is not configured

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <776ac4c8-6e55-468e-bb9d-eea49de0ed89@lucifer.local>
Date: Mon, 15 Dec 2025 10:16:26 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Gregory Price <gourry@...rry.net>
Cc: Jayaraj Rajappan <jayarajpr@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: Calls to do_swap_page() from handle_pte_fault() on a system
 where swap is not configured

On Wed, Dec 03, 2025 at 03:18:57AM -0500, Gregory Price wrote:
> On Wed, Dec 03, 2025 at 01:11:24PM +0530, Jayaraj Rajappan wrote:
> > Hi,
> >
> > On a system where swap is not configured, profiling using Linux "perf"
> > tool shows that do_swap_page() gets called from handle_pte_fault().
> > Kernel version is 5.14.0. HugePages are disabled on the system. Trying
> > to understand what could cause do_swap_page() to be called when there
> > is no swap configured on the system.
> >
>
> all do_swap_page() call means is that the PTE is valid (something is
> there) but not present (the present-bit is not set, or whatever that
> particular architecture considers is present is not met)
>
> static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> {
> ...
>         if (!vmf->pte)
>                 return do_pte_missing(vmf);
>
>         if (!pte_present(vmf->orig_pte))
>                 return do_swap_page(vmf);
> }
>
>
> There are other non-swap swap-entries that are handled by the same code.
> Transient migrations, device private, etc etc.
>
> swap-entries are being renamed "softleaf" entries here shortly:
> https://lore.kernel.org/linux-mm/cover.1762812360.git.lorenzo.stoakes@oracle.com/

And now upstream :)

>
> So it means "a software-entry is present" which could be any of:
> enum softleaf_type {
> 	/* Fundamental types. */
> 	SOFTLEAF_NONE,
> 	SOFTLEAF_SWAP,
> 	/* Migration types. */
> 	SOFTLEAF_MIGRATION_READ,
> 	SOFTLEAF_MIGRATION_READ_EXCLUSIVE,
> 	SOFTLEAF_MIGRATION_WRITE,
> 	/* Device types. */
> 	SOFTLEAF_DEVICE_PRIVATE_READ,
> 	SOFTLEAF_DEVICE_PRIVATE_WRITE,
> 	SOFTLEAF_DEVICE_EXCLUSIVE,
> 	/* H/W posion types. */
> 	SOFTLEAF_HWPOISON,
> 	/* Marker types. */
> 	SOFTLEAF_MARKER,
> };
>

Not to toot my own horn but it's kind of nice to have this explicit list of
possibilities here :)

> -------
>
> +Cc Lorenzo:
>
> do_swap_page() is a stale name now probably?
>
> I know there was a hold-off on changing actual swap code, but maybe
> worth changing do_swap_page -> do_softleaf_page?

Yeah I think this probably would be a good idea.

I do want to keep explicit swap state around to minimise changes to the
_actual_ swap code, possibly even having something like:

swp_entry_t softleaf_to_swap(softleaf_t entry)
{
	VM_WARN_ON_ONCE(!softleaf_is_swap(entry));

	return ...;
}

I'd like to actually make the softleaf_t different from swp_entry_t at some
point just to eliminate confusion between the two (or at least make
conversions explicit, rather).

>
> Not build or anything tested (sorry it's 3am insomnia time), basically just:

Ah sorry you have issues with that, me too! Though not ML'd at 3am before
myself :P

>
> s/do_swap_page/do_softleaf_page
> s/DO_SWAP_PAGE/DO_SOFTLEAF_PAGE
>
> No clue what's up with the special sparc stuff.

:))

But in general LGTM, obviously assuming David's points are addressed.

Is this a patch you'd like to do? Or I can batch up with the next softleaf
series?

Cheers, Lorenzo

>
> ~Gregory
>
> -------
>
> From 92de7131f74b9300ea711ceae98bbe137cf0058f Mon Sep 17 00:00:00 2001
> From: Gregory Price <gourry@...rry.net>
> Date: Wed, 3 Dec 2025 03:12:44 -0500
> Subject: [PATCH] mm: rename do_swap_page to do_softleaf_page
>
> do_swap_page is a stale function name with introductio of softleaf.
>
> Signed-off-by: Gregory Price <gourry@...rry.net>
> ---
>  Documentation/admin-guide/mm/ksm.rst                   |  2 +-
>  Documentation/gpu/rfc/gpusvm.rst                       |  2 +-
>  .../translations/zh_CN/admin-guide/mm/ksm.rst          |  2 +-
>  .../translations/zh_TW/admin-guide/mm/ksm.rst          |  2 +-
>  arch/sparc/include/asm/pgtable_64.h                    |  4 ++--
>  include/linux/ksm.h                                    |  4 ++--
>  include/linux/pgtable.h                                | 10 +++++-----
>  mm/internal.h                                          |  2 +-
>  mm/khugepaged.c                                        |  6 +++---
>  mm/ksm.c                                               |  2 +-
>  mm/memory-failure.c                                    |  2 +-
>  mm/memory.c                                            | 10 +++++-----
>  mm/page_io.c                                           |  2 +-
>  mm/rmap.c                                              |  4 ++--
>  mm/shmem.c                                             |  2 +-
>  mm/swapfile.c                                          |  6 +++---
>  mm/zswap.c                                             |  4 ++--
>  17 files changed, 33 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
> index ad8e7a41f3b5..4c660dbc908e 100644
> --- a/Documentation/admin-guide/mm/ksm.rst
> +++ b/Documentation/admin-guide/mm/ksm.rst
> @@ -286,7 +286,7 @@ cow_ksm
>
>  ksm_swpin_copy
>  	is incremented every time a KSM page is copied when swapping in
> -	note that KSM page might be copied when swapping in because do_swap_page()
> +	note that KSM page might be copied when swapping in because do_softleaf_page()
>  	cannot do all the locking needed to reconstitute a cross-anon_vma KSM page.
>
>  Advisor
> diff --git a/Documentation/gpu/rfc/gpusvm.rst b/Documentation/gpu/rfc/gpusvm.rst
> index 469db1372f16..3a444bf2768b 100644
> --- a/Documentation/gpu/rfc/gpusvm.rst
> +++ b/Documentation/gpu/rfc/gpusvm.rst
> @@ -14,7 +14,7 @@ Agreed upon design principles
>  	  this path. These are not required and generally a bad idea to
>  	  invent driver defined locks to seal core MM races.
>  	* An example of a driver-specific lock causing issues occurred before
> -	  fixing do_swap_page to lock the faulting page. A driver-exclusive lock
> +	  fixing do_softleaf_page to lock the faulting page. A driver-exclusive lock
>  	  in migrate_to_ram produced a stable livelock if enough threads read
>  	  the faulting page.
>  	* Partial migration is supported (i.e., a subset of pages attempting to
> diff --git a/Documentation/translations/zh_CN/admin-guide/mm/ksm.rst b/Documentation/translations/zh_CN/admin-guide/mm/ksm.rst
> index 0029c4fd2201..269cb94362ce 100644
> --- a/Documentation/translations/zh_CN/admin-guide/mm/ksm.rst
> +++ b/Documentation/translations/zh_CN/admin-guide/mm/ksm.rst
> @@ -191,7 +191,7 @@ cow_ksm
>
>  ksm_swpin_copy
>          在换入时，每次KSM页被复制时都会被递增。请注意，KSM页在换入时可能会被复
> -        制，因为do_swap_page()不能做所有的锁，而需要重组一个跨anon_vma的KSM页。
> +        制，因为do_softleaf_page()不能做所有的锁，而需要重组一个跨anon_vma的KSM页。
>
>  --
>  Izik Eidus,
> diff --git a/Documentation/translations/zh_TW/admin-guide/mm/ksm.rst b/Documentation/translations/zh_TW/admin-guide/mm/ksm.rst
> index 1b4944b3cf61..afea57754c41 100644
> --- a/Documentation/translations/zh_TW/admin-guide/mm/ksm.rst
> +++ b/Documentation/translations/zh_TW/admin-guide/mm/ksm.rst
> @@ -191,7 +191,7 @@ cow_ksm
>
>  ksm_swpin_copy
>          在換入時，每次KSM頁被複制時都會被遞增。請注意，KSM頁在換入時可能會被複
> -        制，因爲do_swap_page()不能做所有的鎖，而需要重組一個跨anon_vma的KSM頁。
> +        制，因爲do_softleaf_page()不能做所有的鎖，而需要重組一個跨anon_vma的KSM頁。
>
>  --
>  Izik Eidus,
> diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
> index 615f460c50af..7fee128daf03 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -1054,8 +1054,8 @@ void adi_restore_tags(struct mm_struct *mm, struct vm_area_struct *vma,
>  int adi_save_tags(struct mm_struct *mm, struct vm_area_struct *vma,
>  		  unsigned long addr, pte_t oldpte);
>
> -#define __HAVE_ARCH_DO_SWAP_PAGE
> -static inline void arch_do_swap_page(struct mm_struct *mm,
> +#define __HAVE_ARCH_DO_SOFTLEAF_PAGE
> +static inline void arch_do_softleaf_page(struct mm_struct *mm,
>  				     struct vm_area_struct *vma,
>  				     unsigned long addr,
>  				     pte_t pte, pte_t oldpte)
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index c982694c987b..a024ca1bae3a 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -81,10 +81,10 @@ static inline void ksm_exit(struct mm_struct *mm)
>  }
>
>  /*
> - * When do_swap_page() first faults in from swap what used to be a KSM page,
> + * When do_softleaf_page() first faults in from swap what used to be a KSM page,
>   * no problem, it will be assigned to this vma's anon_vma; but thereafter,
>   * it might be faulted into a different anon_vma (or perhaps to a different
> - * offset in the same anon_vma).  do_swap_page() cannot do all the locking
> + * offset in the same anon_vma).  do_softleaf_page() cannot do all the locking
>   * needed to reconstitute a cross-anon_vma KSM page: for now it has to make
>   * a copy, and leave remerging the pages to a later pass of ksmd.
>   *
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index b13b6f42be3c..21262969b6b3 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1142,8 +1142,8 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
>  }
>  #endif
>
> -#ifndef __HAVE_ARCH_DO_SWAP_PAGE
> -static inline void arch_do_swap_page_nr(struct mm_struct *mm,
> +#ifndef __HAVE_ARCH_DO_SOFTLEAF_PAGE
> +static inline void arch_do_softleaf_page_nr(struct mm_struct *mm,
>  				     struct vm_area_struct *vma,
>  				     unsigned long addr,
>  				     pte_t pte, pte_t oldpte,
> @@ -1157,17 +1157,17 @@ static inline void arch_do_swap_page_nr(struct mm_struct *mm,
>   * page is being swapped out, this metadata must be saved so it can be
>   * restored when the page is swapped back in. SPARC M7 and newer
>   * processors support an ADI (Application Data Integrity) tag for the
> - * page as metadata for the page. arch_do_swap_page() can restore this
> + * page as metadata for the page. arch_do_softleaf_page() can restore this
>   * metadata when a page is swapped back in.
>   */
> -static inline void arch_do_swap_page_nr(struct mm_struct *mm,
> +static inline void arch_do_softleaf_page_nr(struct mm_struct *mm,
>  					struct vm_area_struct *vma,
>  					unsigned long addr,
>  					pte_t pte, pte_t oldpte,
>  					int nr)
>  {
>  	for (int i = 0; i < nr; i++) {
> -		arch_do_swap_page(vma->vm_mm, vma, addr + i * PAGE_SIZE,
> +		arch_do_softleaf_page(vma->vm_mm, vma, addr + i * PAGE_SIZE,
>  				pte_advance_pfn(pte, i),
>  				pte_advance_pfn(oldpte, i));
>  	}
> diff --git a/mm/internal.h b/mm/internal.h
> index 04c307ee33ae..7019db4f6dd6 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -436,7 +436,7 @@ static inline vm_fault_t vmf_anon_prepare(struct vm_fault *vmf)
>  	return ret;
>  }
>
> -vm_fault_t do_swap_page(struct vm_fault *vmf);
> +vm_fault_t do_softleaf_page(struct vm_fault *vmf);
>  void folio_rotate_reclaimable(struct folio *folio);
>  bool __folio_end_writeback(struct folio *folio);
>  void deactivate_file_folio(struct folio *folio);
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 89c33ef7aac3..a6e09e94834c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1007,7 +1007,7 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>  		if (!pte++) {
>  			/*
>  			 * Here the ptl is only used to check pte_same() in
> -			 * do_swap_page(), so readonly version is enough.
> +			 * do_softleaf_page(), so readonly version is enough.
>  			 */
>  			pte = pte_offset_map_ro_nolock(mm, pmd, addr, &ptl);
>  			if (!pte) {
> @@ -1024,12 +1024,12 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm,
>
>  		vmf.pte = pte;
>  		vmf.ptl = ptl;
> -		ret = do_swap_page(&vmf);
> +		ret = do_softleaf_page(&vmf);
>  		/* Which unmaps pte (after perhaps re-checking the entry) */
>  		pte = NULL;
>
>  		/*
> -		 * do_swap_page returns VM_FAULT_RETRY with released mmap_lock.
> +		 * do_softleaf_page returns VM_FAULT_RETRY with released mmap_lock.
>  		 * Note we treat VM_FAULT_RETRY as VM_FAULT_ERROR here because
>  		 * we do not retry here and swap entry will remain in pagetable
>  		 * resulting in later failure.
> diff --git a/mm/ksm.c b/mm/ksm.c
> index cfc182255c7b..002140f01bf5 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -3124,7 +3124,7 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
>  	if (PageHWPoison(page))
>  		return ERR_PTR(-EHWPOISON);
>  	if (!folio_test_uptodate(folio))
> -		return folio;		/* let do_swap_page report the error */
> +		return folio;		/* let do_softleaf_page report the error */
>
>  	new_folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, addr);
>  	if (new_folio &&
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index fbc5a01260c8..07e3a17af119 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1104,7 +1104,7 @@ static int me_pagecache_dirty(struct page_state *ps, struct page *p)
>   *      - but keep in the swap cache, so that when we return to it on
>   *        a later page fault, we know the application is accessing
>   *        corrupted data and shall be killed (we installed simple
> - *        interception code in do_swap_page to catch it).
> + *        interception code in do_softleaf_page to catch it).
>   *
>   * Clean swap cache pages can be directly isolated. A later page fault will
>   * bring in the known good data from disk.
> diff --git a/mm/memory.c b/mm/memory.c
> index 6675e87eb7dd..6d31eda830a5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3371,7 +3371,7 @@ int apply_to_existing_page_range(struct mm_struct *mm, unsigned long addr,
>   * handle_pte_fault chooses page fault handler according to an entry which was
>   * read non-atomically.  Before making any commitment, on those architectures
>   * or configurations (e.g. i386 with PAE) which might give a mix of unmatched
> - * parts, do_swap_page must check under lock before unmapping the pte and
> + * parts, do_softleaf_page must check under lock before unmapping the pte and
>   * proceeding (but do_wp_page is only called after already making such a check;
>   * and do_anonymous_page can safely check later on).
>   */
> @@ -4569,7 +4569,7 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>  		goto fallback;
>
>  	/*
> -	 * For do_swap_page, find the highest order where the aligned range is
> +	 * For do_softleaf_page, find the highest order where the aligned range is
>  	 * completely swap entries with contiguous swap offsets.
>  	 */
>  	order = highest_order(orders);
> @@ -4618,7 +4618,7 @@ static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
>   * We return with the mmap_lock locked or unlocked in the same cases
>   * as does filemap_fault().
>   */
> -vm_fault_t do_swap_page(struct vm_fault *vmf)
> +vm_fault_t do_softleaf_page(struct vm_fault *vmf)
>  {
>  	struct vm_area_struct *vma = vmf->vma;
>  	struct folio *swapcache, *folio = NULL;
> @@ -5008,7 +5008,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  	VM_BUG_ON(!folio_test_anon(folio) ||
>  			(pte_write(pte) && !PageAnonExclusive(page)));
>  	set_ptes(vma->vm_mm, address, ptep, pte, nr_pages);
> -	arch_do_swap_page_nr(vma->vm_mm, vma, address,
> +	arch_do_softleaf_page_nr(vma->vm_mm, vma, address,
>  			pte, pte, nr_pages);
>
>  	folio_unlock(folio);
> @@ -6234,7 +6234,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  		return do_pte_missing(vmf);
>
>  	if (!pte_present(vmf->orig_pte))
> -		return do_swap_page(vmf);
> +		return do_softleaf_page(vmf);
>
>  	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
>  		return do_numa_page(vmf);
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 3c342db77ce3..3bcc8487b600 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -514,7 +514,7 @@ static bool swap_read_folio_zeromap(struct folio *folio)
>  	/*
>  	 * Swapping in a large folio that is partially in the zeromap is not
>  	 * currently handled. Return true without marking the folio uptodate so
> -	 * that an IO error is emitted (e.g. do_swap_page() will sigbus).
> +	 * that an IO error is emitted (e.g. do_softleaf_page() will sigbus).
>  	 */
>  	if (WARN_ON_ONCE(swap_zeromap_batch(folio->swap, nr_pages,
>  			&is_zeromap) != nr_pages))
> diff --git a/mm/rmap.c b/mm/rmap.c
> index f955f02d570e..21c3a40ee824 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2535,7 +2535,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>
>  			/*
>  			 * Store the pfn of the page in a special migration
> -			 * pte. do_swap_page() will wait until the migration
> +			 * pte. do_softleaf_page() will wait until the migration
>  			 * pte is removed and then restart fault handling.
>  			 */
>  			if (writable)
> @@ -2755,7 +2755,7 @@ struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr,
>
>  	/*
>  	 * Store the pfn of the page in a special device-exclusive PFN swap PTE.
> -	 * do_swap_page() will trigger the conversion back while holding the
> +	 * do_softleaf_page() will trigger the conversion back while holding the
>  	 * folio lock.
>  	 */
>  	entry = make_device_exclusive_entry(page_to_pfn(page));
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad18172ff831..7fa65a8501b4 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2092,7 +2092,7 @@ static struct folio *shmem_swap_alloc_folio(struct inode *inode,
>   * we may need to copy to a suitable page before moving to filecache.
>   *
>   * In a future release, this may well be extended to respect cpuset and
> - * NUMA mempolicy, and applied also to anonymous pages in do_swap_page();
> + * NUMA mempolicy, and applied also to anonymous pages in do_softleaf_page();
>   * but for now it is a simple matter of zone.
>   */
>  static bool shmem_should_replace_folio(struct folio *folio, gfp_t gfp)
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d12332423a06..40039586f56e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1568,7 +1568,7 @@ static unsigned char swap_entry_put_locked(struct swap_info_struct *si,
>   * example, the following situation is possible.
>   *
>   *   CPU1				CPU2
> - *   do_swap_page()
> + *   do_softleaf_page()
>   *     ...				swapoff+swapon
>   *     __read_swap_cache_async()
>   *       swapcache_prepare()
> @@ -1578,7 +1578,7 @@ static unsigned char swap_entry_put_locked(struct swap_info_struct *si,
>   *
>   * In __swap_duplicate(), the swap_map need to be checked before
>   * changing partly because the specified swap entry may be for another
> - * swap device which has been swapoff.  And in do_swap_page(), after
> + * swap device which has been swapoff.  And in do_softleaf_page(), after
>   * the page is read from the swap device, the PTE is verified not
>   * changed with the page table locked to check whether the swap device
>   * has been swapoff or swapoff+swapon.
> @@ -2201,7 +2201,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>  		rmap_t rmap_flags = RMAP_NONE;
>
>  		/*
> -		 * See do_swap_page(): writeback would be problematic.
> +		 * See do_softleaf_page(): writeback would be problematic.
>  		 * However, we do a folio_wait_writeback() just before this
>  		 * call and have the folio locked.
>  		 */
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 5d0f8b13a958..db85ad97ccdb 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1586,13 +1586,13 @@ bool zswap_store(struct folio *folio)
>   *
>   *  -EIO: if the swapped out content was in zswap, but could not be loaded
>   *  into the page due to a decompression failure. The folio is unlocked, but
> - *  NOT marked up-to-date, so that an IO error is emitted (e.g. do_swap_page()
> + *  NOT marked up-to-date, so that an IO error is emitted (e.g. do_softleaf_page()
>   *  will SIGBUS).
>   *
>   *  -EINVAL: if the swapped out content was in zswap, but the page belongs
>   *  to a large folio, which is not supported by zswap. The folio is unlocked,
>   *  but NOT marked up-to-date, so that an IO error is emitted (e.g.
> - *  do_swap_page() will SIGBUS).
> + *  do_softleaf_page() will SIGBUS).
>   *
>   *  -ENOENT: if the swapped out content was not in zswap. The folio remains
>   *  locked on return.
> --
> 2.52.0
>