lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d4da271-472b-4a32-9e51-3ff4d8c2e232@lucifer.local>
Date: Mon, 27 Oct 2025 17:33:57 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Janosch Frank <frankja@...ux.ibm.com>,
        Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        David Hildenbrand <david@...hat.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>, Zi Yan <ziy@...dia.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Nico Pache <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
        Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
        Lance Yang <lance.yang@...ux.dev>,
        Kemeng Shi <shikemeng@...weicloud.com>,
        Kairui Song <kasong@...cent.com>, Nhat Pham <nphamcs@...il.com>,
        Baoquan He <bhe@...hat.com>, Chris Li <chrisl@...nel.org>,
        Peter Xu <peterx@...hat.com>, Matthew Wilcox <willy@...radead.org>,
        Leon Romanovsky <leon@...nel.org>, Muchun Song <muchun.song@...ux.dev>,
        Oscar Salvador <osalvador@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
        Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
        Jann Horn <jannh@...gle.com>, Matthew Brost <matthew.brost@...el.com>,
        Joshua Hahn <joshua.hahnjy@...il.com>, Rakie Kim <rakie.kim@...com>,
        Byungchul Park <byungchul@...com>, Gregory Price <gourry@...rry.net>,
        Ying Huang <ying.huang@...ux.alibaba.com>,
        Alistair Popple <apopple@...dia.com>, Pedro Falcato <pfalcato@...e.de>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>,
        kvm@...r.kernel.org, linux-s390@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap
 confusion

(Note I never intended this to be an RFC, it was only because of
series-likely-to-be-dropped causing nasty conflicts this isn't an 'out
there' series rather a practical submission).

To preface, as I said elsewhere, I intend to do more on this, renaming
swp_entry_t to probably leaf_entry_t (thanks Gregory!)

The issue is no matter how I do this people will theorise different
approaches, I'm trying to practically find a way forward that works
iteratively.

There are 502 lines referencing swp_entry_t in the kernel.

Leaving _that_ to another series seems sensible to me.

On Mon, Oct 27, 2025 at 01:09:23PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> > There's an established convention in the kernel that we treat leaf page
> > tables (so far at the PTE, PMD level) as containing 'swap entries' should
> > they be neither empty (i.e. p**_none() evaluating true) nor present
> > (i.e. p**_present() evaluating true).
>
> I have to say I've never liked the none-vs-present naming either.

OK.

I think that's not something we can reasonably get away from practically at
this point.

I don't love 'none' as a thing. Empty would be better but you know
naming... hard.

>
> > This is deeply confusing, so this series goes further and eliminates the
> > non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> > an eye to a new convention of referring to these non-swap 'swap entries' as
> > non-present.
>
> I'm not keen on is_non_present_entry(), it seems confusing again.

What is confusing about it?

The idea is we don't mention swap. If something is non-present and not swap
then it's... a non-present entry.

The logic generally does need to differentiate between the two.

Otherwise we're back to referencing swap again...

>
> It looks like we are stuck with swp_entry_t as the being the handle
> for a non-present pte. Oh well, not a great name, but fine..

We're not stuck with it, I'm just doing things a step at a time. I fully
intend to rename it.

Perfect is the enemy of the good, and this series improves things a lot.

>
> So we think of that swp_entry_t having multiple types: swap, migration,
> device private, etc, etc

I mean do we use fields in the swap entry significantly differently in each
of these types?

I think any change like that would need to be a follow-up series because
that'd involve rewriting a lot of code...

If there's _good reason_ to and we can sensibly stick a like struct entry
in the union we can rename swp_type to leaf_type or something that could be
nice actually.

But maybe make opaque...

But only if there's enough variation in the 'shape' of the data in the swap
entry between different types.

I would need to go check... because if not, then there's no point really.

>
> Then I'd think the general pattern should be to get a swp_entry_t:
>
>     if (pte_present(pte))
>         return;
>     swpent = pte_to_swp_entry(pte);
>
> And then evaluate the type:
>
>     if (swpent_is_swap()) {
>     }

That just embeds the confusion re: swap entry in the function name, no
that's horrible?

We may as well keep non_swap_entry() if we do that right?

Again, I intend to rename swap_entry_t. I'm not doing everything at once
because it's simply not practical.

>
>
> If you keep the naming as "swp_entry" indicates the multi-type value,

Nope don't intend to keep.

> then "swap" can mean a swp_entry which is used by the swap subsystem.
>
> That suggests functions like this:
>
> swpent_is_swap()
> swpent_is_migration()
> ..

The _whole point_ of this series is to separate out the idea that you're
dealing with swap entries so I don't like swpent as a name obviously.

Once we do the rename, we can do something like leafent_is_swap() etc.

And agreed that's neater and more consistent. We'd want to rename all swap
predicates similarly.

We could also just pre-empt and prefix functions with leafent_is_swap() if
you prefer.

We could even do:

/* TODO: Rename swap_entry_t to leaf_entry_t */
typedef swap_entry_t leaf_entry_t;

And use the new type right away.

What do you think?

>
> and your higher level helpers like:
>
> /* True if the pte is a swpent_is_swap() */
> static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
> {
>    if (pte_present(pte))
>         return false;
>    *swpent = pte_to_swp_entry(pte);
>    return swpent_is_swap(*swpent);
> }

I already implement in the series a pte_to_swp_entry_or_zero() function
that goes one further - checks pte_present() for you, if pte_none() you
just get an empty swap entry, so this can be:

static inline bool get_pte_swap_entry(pte_t, swp_entry_t *entryp)
{
	*entryp = pte_to_swp_entry_or_zero(pte);
	return is_swap_entry(*entryp);
}

Which is a valid refactoring suggestion but the building blocks are
_already in the series_.

I mean that's valid feedback, that's much nicer, will refactor thanks.

>
> I also think it will be more readable to keep all these things under a
> swpent namespace instead of using unstructured english names.

Nope. Again, the whole point of the series is to avoid referencing
swap. swpent_xxx() is just eliminating the purpose of the series right?

Yes it sucks that the type name is what it is, but this is an iterative
process.

But as above, we could pre-empt future changes and prefix with a
leafent_*() prefix if that works for you?

>
> > * pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
> >   to a swap entry if present, or an empty swap entry if none. This is
> >   useful as many swap entry conversions are simply checking for flags for
> >   which this suffices.
>
> I'd expect a safe function should be more like
>
>    *swpent = pte_to_swp_entry_safe(pte);
>    return swpent_is_swap(*swpent);
>
> Where "safe" means that if the PTE is None or Present then
> swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
> always nothing.

Not sure it's really 'safe', the name is unfortunate, but you could read
this as 'always get a valid swap entry to operate on'...

I like to encode the fact we are treating pte_none() this way.

pte_to_leaf_entry() is possible?

Or

leaf_entry_t leafent_from_pte()...?

>
> > * get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
> >   entry (i.e. not a non-present entry), returning true if so, otherwise
> >   returns false. This simplifies a lot of logic that previously open-coded
> >   this.
>
> Like this is still a tortured function:

This is moot (also not the final version ofc as I get rid of
non_swap_entry()), I agree the approach you suggest is better as per above
so we'll go with that (though without the horrid embeded assignment bit).

>
> +static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> +{
> +       if (pte_present(pte))
> +               return false;
> +       if (pte_none(pte))
> +               return false;
> +
> +       *entryp = pte_to_swp_entry(pte);
> +       if (non_swap_entry(*entryp))
> +               return false;
> +
> +       return true;
> +}
> +
>
> static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> {
>    return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
> }

I absolutely hate that embedded assignment, but this is equivalent to what
I suggested above, so agreed this is a good suggestion broadly.

>
> Maybe it doesn't even need an inline at that point?

Don't understand what you mean by that. It's in a header file?

>
> > * is_huge_pmd() - Determines if a PMD contains either a present transparent
> >   huge page entry or a huge non-present entry. This again simplifies a lot
> >   of logic that simply open-coded this.
>
> is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
> when any of these APIs accept swap entries without being explicit

Again, I'm not going to reference swap in a series intended to eliminate
this, it defeats the purpose.

And the non-present (or whatever you want to call it) entry _is_ huge. So
it's just adding more confusion that way IMO.

>
> Jason

Thanks, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ