lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8cd315b-4902-47a1-b75f-21574a4488c5@lucifer.local>
Date: Mon, 10 Nov 2025 19:38:18 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Kairui Song <ryncsn@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Janosch Frank <frankja@...ux.ibm.com>,
        Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        David Hildenbrand <david@...hat.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>, Peter Xu <peterx@...hat.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
        Arnd Bergmann <arnd@...db.de>, Zi Yan <ziy@...dia.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
        Lance Yang <lance.yang@...ux.dev>, Muchun Song <muchun.song@...ux.dev>,
        Oscar Salvador <osalvador@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
        Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Matthew Brost <matthew.brost@...el.com>,
        Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
        Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
        Ying Huang <ying.huang@...ux.alibaba.com>,
        Alistair Popple <apopple@...dia.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>,
        Kemeng Shi <shikemeng@...weicloud.com>, Nhat Pham <nphamcs@...il.com>,
        Baoquan He <bhe@...hat.com>, Chris Li <chrisl@...nel.org>,
        SeongJae Park <sj@...nel.org>, Matthew Wilcox <willy@...radead.org>,
        Jason Gunthorpe <jgg@...pe.ca>, Leon Romanovsky <leon@...nel.org>,
        Xu Xin <xu.xin16@....com.cn>,
        Chengming Zhou <chengming.zhou@...ux.dev>,
        Jann Horn <jannh@...gle.com>, Miaohe Lin <linmiaohe@...wei.com>,
        Naoya Horiguchi <nao.horiguchi@...il.com>,
        Pedro Falcato <pfalcato@...e.de>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>,
        Hugh Dickins <hughd@...gle.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, linux-s390@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-arch@...r.kernel.org, damon@...ts.linux.dev
Subject: Re: [PATCH v2 04/16] mm: eliminate is_swap_pte() when
 softleaf_from_pte() suffices

On Sun, Nov 09, 2025 at 08:49:02PM +0800, Kairui Song wrote:
> On Sun, Nov 9, 2025 at 2:16 AM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> >
> > In cases where we can simply utilise the fact that softleaf_from_pte()
> > treats present entries as if they were none entries and thus eliminate
> > spurious uses of is_swap_pte(), do so.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > ---
> >  mm/internal.h   |  7 +++----
> >  mm/madvise.c    |  8 +++-----
> >  mm/swap_state.c | 12 ++++++------
> >  mm/swapfile.c   |  9 ++++-----
> >  4 files changed, 16 insertions(+), 20 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 9465129367a4..f0c7461bb02c 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -15,7 +15,7 @@
> >  #include <linux/pagewalk.h>
> >  #include <linux/rmap.h>
> >  #include <linux/swap.h>
> > -#include <linux/swapops.h>
> > +#include <linux/leafops.h>
> >  #include <linux/swap_cgroup.h>
> >  #include <linux/tracepoint-defs.h>
> >
> > @@ -380,13 +380,12 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
> >  {
> >         pte_t expected_pte = pte_next_swp_offset(pte);
> >         const pte_t *end_ptep = start_ptep + max_nr;
> > -       swp_entry_t entry = pte_to_swp_entry(pte);
> > +       const softleaf_t entry = softleaf_from_pte(pte);
> >         pte_t *ptep = start_ptep + 1;
> >         unsigned short cgroup_id;
> >
> >         VM_WARN_ON(max_nr < 1);
> > -       VM_WARN_ON(!is_swap_pte(pte));
> > -       VM_WARN_ON(non_swap_entry(entry));
> > +       VM_WARN_ON(!softleaf_is_swap(entry));
> >
> >         cgroup_id = lookup_swap_cgroup_id(entry);
> >         while (ptep < end_ptep) {
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 2d5ad3cb37bb..58d82495b6c6 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -195,7 +195,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> >
> >         for (addr = start; addr < end; addr += PAGE_SIZE) {
> >                 pte_t pte;
> > -               swp_entry_t entry;
> > +               softleaf_t entry;
> >                 struct folio *folio;
> >
> >                 if (!ptep++) {
> > @@ -205,10 +205,8 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> >                 }
> >
> >                 pte = ptep_get(ptep);
> > -               if (!is_swap_pte(pte))
> > -                       continue;
> > -               entry = pte_to_swp_entry(pte);
> > -               if (unlikely(non_swap_entry(entry)))
> > +               entry = softleaf_from_pte(pte);
> > +               if (unlikely(!softleaf_is_swap(entry)))
> >                         continue;
> >
> >                 pte_unmap_unlock(ptep, ptl);
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index d20d238109f9..8881a79f200c 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -12,7 +12,7 @@
> >  #include <linux/kernel_stat.h>
> >  #include <linux/mempolicy.h>
> >  #include <linux/swap.h>
> > -#include <linux/swapops.h>
> > +#include <linux/leafops.h>
> >  #include <linux/init.h>
> >  #include <linux/pagemap.h>
> >  #include <linux/pagevec.h>
> > @@ -732,7 +732,6 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> >         pte_t *pte = NULL, pentry;
> >         int win;
> >         unsigned long start, end, addr;
> > -       swp_entry_t entry;
> >         pgoff_t ilx;
> >         bool page_allocated;
> >
> > @@ -744,16 +743,17 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> >
> >         blk_start_plug(&plug);
> >         for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
> > +               softleaf_t entry;
> > +
> >                 if (!pte++) {
> >                         pte = pte_offset_map(vmf->pmd, addr);
> >                         if (!pte)
> >                                 break;
> >                 }
> >                 pentry = ptep_get_lockless(pte);
> > -               if (!is_swap_pte(pentry))
> > -                       continue;
> > -               entry = pte_to_swp_entry(pentry);
> > -               if (unlikely(non_swap_entry(entry)))
> > +               entry = softleaf_from_pte(pentry);
> > +
> > +               if (!softleaf_is_swap(entry))
>
> Hi Lorenzo,
>
> This part isn't right, is_swap_pte excludes present PTE and non PTE,
> but softleaf_from_pte returns a invalid swap entry from a non PTE.
>
> This may lead to a kernel panic as the invalid swap value will be
> 0x3ffffffffffff on x86_64 (pte_to_swp_entry(0)), the offset value will
> cause out of border access.

Hmm,

static inline softleaf_t softleaf_from_pte(pte_t pte)
{
	softleaf_t arch_entry;

	if (pte_present(pte))
		return softleaf_mk_none();

	pte = pte_swp_clear_flags(pte);
              ^
	      |
For (0) value stays the same.

	arch_entry = __pte_to_swp_entry(pte);
              ^
              |
#define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })

Just grabs the avlue.

	/* Temporary until swp_entry_t eliminated. */
	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
                               ^
                               |
#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))

This will be 0 shifted so 0.

#define __swp_offset(x) (~(x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)

This however will be a strange value, so this is a point I overlooked.

Presumably this is the 0x3fff...f value you're referring to.

And this has a knock-on effect for softleaf_is_none()... damn.


}

>
> We might need something like this on top of patch 2:
>
> diff --git a/include/linux/leafops.h b/include/linux/leafops.h
> index 1376589d94b0..49de62f96835 100644
> --- a/include/linux/leafops.h
> +++ b/include/linux/leafops.h
> @@ -54,7 +54,7 @@ static inline softleaf_t softleaf_mk_none(void)
>   */
>  static inline softleaf_t softleaf_from_pte(pte_t pte)
>  {
> -       if (pte_present(pte))
> +       if (pte_present(pte) || pte_none(pte))

I was hoping we could avoid this, but in practice on a modern CPU given we're
checking a value in a register against a bit/being empty this should be no
issue.

Will update, also softleaf_from_pmd().

>                 return softleaf_mk_none();
>
>         /* Temporary until swp_entry_t eliminated. */

Thanks, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ