[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8cd315b-4902-47a1-b75f-21574a4488c5@lucifer.local>
Date: Mon, 10 Nov 2025 19:38:18 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Kairui Song <ryncsn@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Janosch Frank <frankja@...ux.ibm.com>,
Claudio Imbrenda <imbrenda@...ux.ibm.com>,
David Hildenbrand <david@...hat.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Peter Xu <peterx@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Arnd Bergmann <arnd@...db.de>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
Lance Yang <lance.yang@...ux.dev>, Muchun Song <muchun.song@...ux.dev>,
Oscar Salvador <osalvador@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Matthew Brost <matthew.brost@...el.com>,
Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
Ying Huang <ying.huang@...ux.alibaba.com>,
Alistair Popple <apopple@...dia.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>,
Kemeng Shi <shikemeng@...weicloud.com>, Nhat Pham <nphamcs@...il.com>,
Baoquan He <bhe@...hat.com>, Chris Li <chrisl@...nel.org>,
SeongJae Park <sj@...nel.org>, Matthew Wilcox <willy@...radead.org>,
Jason Gunthorpe <jgg@...pe.ca>, Leon Romanovsky <leon@...nel.org>,
Xu Xin <xu.xin16@....com.cn>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Jann Horn <jannh@...gle.com>, Miaohe Lin <linmiaohe@...wei.com>,
Naoya Horiguchi <nao.horiguchi@...il.com>,
Pedro Falcato <pfalcato@...e.de>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>,
Hugh Dickins <hughd@...gle.com>, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, linux-s390@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-arch@...r.kernel.org, damon@...ts.linux.dev
Subject: Re: [PATCH v2 04/16] mm: eliminate is_swap_pte() when
softleaf_from_pte() suffices
On Sun, Nov 09, 2025 at 08:49:02PM +0800, Kairui Song wrote:
> On Sun, Nov 9, 2025 at 2:16 AM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> >
> > In cases where we can simply utilise the fact that softleaf_from_pte()
> > treats present entries as if they were none entries and thus eliminate
> > spurious uses of is_swap_pte(), do so.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > ---
> > mm/internal.h | 7 +++----
> > mm/madvise.c | 8 +++-----
> > mm/swap_state.c | 12 ++++++------
> > mm/swapfile.c | 9 ++++-----
> > 4 files changed, 16 insertions(+), 20 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 9465129367a4..f0c7461bb02c 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -15,7 +15,7 @@
> > #include <linux/pagewalk.h>
> > #include <linux/rmap.h>
> > #include <linux/swap.h>
> > -#include <linux/swapops.h>
> > +#include <linux/leafops.h>
> > #include <linux/swap_cgroup.h>
> > #include <linux/tracepoint-defs.h>
> >
> > @@ -380,13 +380,12 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
> > {
> > pte_t expected_pte = pte_next_swp_offset(pte);
> > const pte_t *end_ptep = start_ptep + max_nr;
> > - swp_entry_t entry = pte_to_swp_entry(pte);
> > + const softleaf_t entry = softleaf_from_pte(pte);
> > pte_t *ptep = start_ptep + 1;
> > unsigned short cgroup_id;
> >
> > VM_WARN_ON(max_nr < 1);
> > - VM_WARN_ON(!is_swap_pte(pte));
> > - VM_WARN_ON(non_swap_entry(entry));
> > + VM_WARN_ON(!softleaf_is_swap(entry));
> >
> > cgroup_id = lookup_swap_cgroup_id(entry);
> > while (ptep < end_ptep) {
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 2d5ad3cb37bb..58d82495b6c6 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -195,7 +195,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> >
> > for (addr = start; addr < end; addr += PAGE_SIZE) {
> > pte_t pte;
> > - swp_entry_t entry;
> > + softleaf_t entry;
> > struct folio *folio;
> >
> > if (!ptep++) {
> > @@ -205,10 +205,8 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> > }
> >
> > pte = ptep_get(ptep);
> > - if (!is_swap_pte(pte))
> > - continue;
> > - entry = pte_to_swp_entry(pte);
> > - if (unlikely(non_swap_entry(entry)))
> > + entry = softleaf_from_pte(pte);
> > + if (unlikely(!softleaf_is_swap(entry)))
> > continue;
> >
> > pte_unmap_unlock(ptep, ptl);
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index d20d238109f9..8881a79f200c 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -12,7 +12,7 @@
> > #include <linux/kernel_stat.h>
> > #include <linux/mempolicy.h>
> > #include <linux/swap.h>
> > -#include <linux/swapops.h>
> > +#include <linux/leafops.h>
> > #include <linux/init.h>
> > #include <linux/pagemap.h>
> > #include <linux/pagevec.h>
> > @@ -732,7 +732,6 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> > pte_t *pte = NULL, pentry;
> > int win;
> > unsigned long start, end, addr;
> > - swp_entry_t entry;
> > pgoff_t ilx;
> > bool page_allocated;
> >
> > @@ -744,16 +743,17 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
> >
> > blk_start_plug(&plug);
> > for (addr = start; addr < end; ilx++, addr += PAGE_SIZE) {
> > + softleaf_t entry;
> > +
> > if (!pte++) {
> > pte = pte_offset_map(vmf->pmd, addr);
> > if (!pte)
> > break;
> > }
> > pentry = ptep_get_lockless(pte);
> > - if (!is_swap_pte(pentry))
> > - continue;
> > - entry = pte_to_swp_entry(pentry);
> > - if (unlikely(non_swap_entry(entry)))
> > + entry = softleaf_from_pte(pentry);
> > +
> > + if (!softleaf_is_swap(entry))
>
> Hi Lorenzo,
>
> This part isn't right, is_swap_pte excludes present PTE and non PTE,
> but softleaf_from_pte returns a invalid swap entry from a non PTE.
>
> This may lead to a kernel panic as the invalid swap value will be
> 0x3ffffffffffff on x86_64 (pte_to_swp_entry(0)), the offset value will
> cause out of border access.
Hmm,
static inline softleaf_t softleaf_from_pte(pte_t pte)
{
softleaf_t arch_entry;
if (pte_present(pte))
return softleaf_mk_none();
pte = pte_swp_clear_flags(pte);
^
|
For (0) value stays the same.
arch_entry = __pte_to_swp_entry(pte);
^
|
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) })
Just grabs the avlue.
/* Temporary until swp_entry_t eliminated. */
return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
^
|
#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))
This will be 0 shifted so 0.
#define __swp_offset(x) (~(x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)
This however will be a strange value, so this is a point I overlooked.
Presumably this is the 0x3fff...f value you're referring to.
And this has a knock-on effect for softleaf_is_none()... damn.
}
>
> We might need something like this on top of patch 2:
>
> diff --git a/include/linux/leafops.h b/include/linux/leafops.h
> index 1376589d94b0..49de62f96835 100644
> --- a/include/linux/leafops.h
> +++ b/include/linux/leafops.h
> @@ -54,7 +54,7 @@ static inline softleaf_t softleaf_mk_none(void)
> */
> static inline softleaf_t softleaf_from_pte(pte_t pte)
> {
> - if (pte_present(pte))
> + if (pte_present(pte) || pte_none(pte))
I was hoping we could avoid this, but in practice on a modern CPU given we're
checking a value in a register against a bit/being empty this should be no
issue.
Will update, also softleaf_from_pmd().
> return softleaf_mk_none();
>
> /* Temporary until swp_entry_t eliminated. */
Thanks, Lorenzo
Powered by blists - more mailing lists