[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZWDog_cz66g38d0I@x1n>
Date: Fri, 24 Nov 2023 13:16:35 -0500
From: Peter Xu <peterx@...hat.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>,
"Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>
Cc: Christoph Hellwig <hch@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Andrea Arcangeli <aarcange@...hat.com>,
James Houghton <jthoughton@...gle.com>,
Lorenzo Stoakes <lstoakes@...il.com>,
David Hildenbrand <david@...hat.com>,
Vlastimil Babka <vbabka@...e.cz>,
John Hubbard <jhubbard@...dia.com>,
Yang Shi <shy828301@...il.com>,
Rik van Riel <riel@...riel.com>,
Hugh Dickins <hughd@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
Jason Gunthorpe <jgg@...dia.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Andrew Morton <akpm@...ux-foundation.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>,
Mike Rapoport <rppt@...nel.org>,
Mike Kravetz <mike.kravetz@...cle.com>
Subject: Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in
hugepd processing
Hi, Christophe, Michael, Aneesh,
[I'll reply altogether here]
On Fri, Nov 24, 2023 at 07:03:11AM +0000, Christophe Leroy wrote:
> I added that code with commit e17eae2b8399 ("mm: pagewalk: fix walk for
> hugepage tables") because I was getting crazy displays when dumping
> /sys/kernel/debug/pagetables
>
> Huge pages can be used for many thing.
>
> On powerpc 8xx, there are 4 possible page size: 4k, 16k, 512k and 8M.
> Each PGD entry addresses 4M areas, so hugepd is used for anything using
> 8M pages. Could have used regular page tables instead, but it is not
> worth allocating a 4k table when the HW will only read first entry.
>
> At the time being, linear memory mapping is performed with 8M pages, so
> ptdump_walk_pgd() will walk into huge page directories.
>
> Also, huge pages can be used in vmalloc() and in vmap(). At the time
> being we support 512k pages there on the 8xx. 8M pages will be supported
> once vmalloc() and vmap() support hugepd, as explained in commit
> a6a8f7c4aa7e ("powerpc/8xx: add support for huge pages on VMAP and VMALLOC")
>
> So yes as a conclusion hugepd is used outside hugetlbfs, hope it
> clarifies things.
Yes it does, thanks a lot for all of your replies.
So I think this is what I missed: on Freescale ppc 8xx there's a special
hugepd_populate_kernel() defined to install kernel pgtables for hugepd.
Obviously I didn't check further than hugepd_populate() when I first
looked, and stopped at the first instance of hugepd_populate() definition
on the 64 bits ppc.
For this specific patch: I suppose the change is still all fine to reuse
the fast-gup function, because it is still true when there's a VMA present
(GUP applies only to user mappings, nothing like KASAN should ever pop up).
So AFAIU it's still true that hugepd is only used in hugetlb pages in this
case even for Freescale 8xx, and nothing should yet explode. So maybe I
can still keep the code changes.
However the comment at least definitely needs fixing (that I'm going to add
some, which hch requested and I agree), that is not yet in the patch I
posted here but I'll refine them locally.
For the whole work: the purpose of it is to start merging hugetlb pgtable
processing with generic mm. That is my take of previous lsfmm discussions
in the community on how we should move forward with hugetlb in the future,
to avoid code duplications against generic mm. Hugetlb is kind of blocked
on adding new (especially, large) features in general because of such
complexity. This is all about that, but a small step towards it.
I see that it seems a trend to make hugepd more general. Christophe's fix
on dump pgtable is exactly what I would also look for if keep going. I
hope that's the right way to go.
I'll also need to think more on how this will affect my plan, currently it
seems all fine: I won't ever try to change any kernel mapping specific
code. I suppose any hugetlbfs based test should still cover all codes I
will touch on hugepd. Then it should just work for kernel mappings on
Freescales; it'll be great if e.g. Christophe can help me double check that
if the series can stablize in a few versions. If any of you have any hint
on testing it'll be more than welcomed, either specific test case or hints;
currently I'm still at a phase looking for a valid ppc systems - QEMU tcg
ppc64 emulation on x86 is slow enough to let me give up already.
Considering hugepd's specialty in ppc and the possibility that I'll break
it, there's yet another option which is I only apply the new logic into
archs with !ARCH_HAS_HUGEPD. It'll make my life easier, but that also
means even if my attempt would work out anything new will by default rule
ppc out. And we'll have a bunch of "#ifdef ARCH_HAS_HUGEPD" in generic
code, which is not preferred either. For gup, it might be relatively easy
when comparing to the rest. I'm still hesitating for the long term plan.
Please let me know if you have any thoughts on any of above.
Thanks!
--
Peter Xu
Powered by blists - more mailing lists