[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240221125753.GQ13330@nvidia.com>
Date: Wed, 21 Feb 2024 08:57:53 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Peter Xu <peterx@...hat.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
James Houghton <jthoughton@...gle.com>,
David Hildenbrand <david@...hat.com>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Yang Shi <shy828301@...il.com>, linux-riscv@...ts.infradead.org,
Andrew Morton <akpm@...ux-foundation.org>,
"Aneesh Kumar K . V" <aneesh.kumar@...nel.org>,
Rik van Riel <riel@...riel.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Mike Rapoport <rppt@...nel.org>, John Hubbard <jhubbard@...dia.com>,
Vlastimil Babka <vbabka@...e.cz>,
Michael Ellerman <mpe@...erman.id.au>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Andrew Jones <andrew.jones@...ux.dev>,
linuxppc-dev@...ts.ozlabs.org,
Mike Kravetz <mike.kravetz@...cle.com>,
Muchun Song <muchun.song@...ux.dev>,
linux-arm-kernel@...ts.infradead.org,
Christoph Hellwig <hch@...radead.org>,
Lorenzo Stoakes <lstoakes@...il.com>,
Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH v2 03/13] mm: Provide generic pmd_thp_or_huge()
On Wed, Feb 21, 2024 at 05:37:37PM +0800, Peter Xu wrote:
> On Mon, Jan 15, 2024 at 01:55:51PM -0400, Jason Gunthorpe wrote:
> > On Wed, Jan 03, 2024 at 05:14:13PM +0800, peterx@...hat.com wrote:
> > > From: Peter Xu <peterx@...hat.com>
> > >
> > > ARM defines pmd_thp_or_huge(), detecting either a THP or a huge PMD. It
> > > can be a helpful helper if we want to merge more THP and hugetlb code
> > > paths. Make it a generic default implementation, only exist when
> > > CONFIG_MMU. Arch can overwrite it by defining its own version.
> > >
> > > For example, ARM's pgtable-2level.h defines it to always return false.
> > >
> > > Keep the macro declared with all config, it should be optimized to a false
> > > anyway if !THP && !HUGETLB.
> > >
> > > Signed-off-by: Peter Xu <peterx@...hat.com>
> > > ---
> > > include/linux/pgtable.h | 4 ++++
> > > mm/gup.c | 3 +--
> > > 2 files changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > > index 466cf477551a..2b42e95a4e3a 100644
> > > --- a/include/linux/pgtable.h
> > > +++ b/include/linux/pgtable.h
> > > @@ -1362,6 +1362,10 @@ static inline int pmd_write(pmd_t pmd)
> > > #endif /* pmd_write */
> > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> > >
> > > +#ifndef pmd_thp_or_huge
> > > +#define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd))
> > > +#endif
> >
> > Why not just use pmd_leaf() ?
> >
> > This GUP case seems to me exactly like what pmd_leaf() should really
> > do and be used for..
>
> I think I mostly agree with you, and these APIs are indeed confusing. IMHO
> the challenge is about the risk of breaking others on small changes in the
> details where evil resides.
These APIs are super confusing, which is why I brought it up.. Adding
even more subtly different variations is not helping.
I think pmd_leaf means the entry is present and refers to a physical
page not another radix level.
> > eg x86 does:
> >
> > #define pmd_leaf pmd_large
> > static inline int pmd_large(pmd_t pte)
> > return pmd_flags(pte) & _PAGE_PSE;
> >
> > static inline int pmd_trans_huge(pmd_t pmd)
> > return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE;
> >
> > int pmd_huge(pmd_t pmd)
> > return !pmd_none(pmd) &&
> > (pmd_val(pmd) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
>
> For example, here I don't think it's strictly pmd_leaf()? As pmd_huge()
> will return true if PRESENT=0 && PSE=0 (as long as none pte ruled out
> first), while pmd_leaf() will return false; I think that came from
> cbef8478bee5.
Yikes, but do you even want to handle non-present entries in GUP
world? Isn't everything gated by !present in the first place?
> Besides that, there're also other cases where it's not clear of such direct
> replacement, not until further investigated. E.g., arm-3level has:
>
> #define pmd_leaf(pmd) pmd_sect(pmd)
> #define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
> PMD_TYPE_SECT)
> #define PMD_TYPE_SECT (_AT(pmdval_t, 1) << 0)
>
> While pmd_huge() there relies on PMD_TABLE_BIT ()
I looked at tht, it looked OK..
#define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0)
#define PMD_TABLE_BIT (_AT(pmdval_t, 1) << 1)
It is the same stuff, just a little confusingly written
Jason
Powered by blists - more mailing lists