[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240327165754.GM946323@nvidia.com>
Date: Wed, 27 Mar 2024 13:57:54 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Peter Xu <peterx@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate()
On Wed, Mar 27, 2024 at 09:58:35AM +0000, Christophe Leroy wrote:
> > Just general remarks on the ones with huge pages:
> >
> > hash 64k and hugepage 16M/16G
> > radix 64k/radix hugepage 2M/1G
> > radix 4k/radix hugepage 2M/1G
> > nohash 32
> > - I think this is just a normal x86 like scheme? PMD/PUD can be a
> > leaf with the same size as a next level table.
> >
> > Do any of these cases need to know the higher level to parse the
> > lower? eg is there a 2M bit in the PUD indicating that the PMD
> > is a table of 2M leafs or does each PMD entry have a bit
> > indicating it is a leaf?
>
> For hash and radix there is a bit that tells it is leaf (_PAGE_PTE)
>
> For nohash32/e500 I think the drawing is not full right, there is a huge
> page directory (hugepd) with a single entry. I think it should be
> possible to change it to a leaf entry, it seems we have bit _PAGE_SW1
> available in the PTE.
It sounds to me like PPC breaks down into only a couple fundamental
behaviors
- x86 like leaf in many page levels. Use the pgd/pud/pmd_leaf() and
related to implement it
- ARM like contig PTE within a single page table level. Use the
contig sutff to implement it
- Contig PTE across two page table levels with a bit in the
PMD. Needs new support like you showed
- Page table levels with a variable page size. Ie a PUD can point to
a directory of 8 pages or 512 pages of different size. Probbaly
needs some new core support, but I think your changes to the
*_offset go a long way already.
> >
> > hash 4k and hugepage 16M/16G
> > nohash 64
> > - How does this work? I guess since 8xx explicitly calls out
> > consecutive this is actually the pgd can point to 512 256M
> > entries or 8 16G entries? Ie the table size at each level is
> > varable? Or is it the same and the table size is still 512 and
> > each 16G entry is replicated 64 times?
>
> For those it is using the huge page directory (hugepd) which can be
> hooked at any level and is a directory of huge pages on its own. There
> is no consecutive entries involved here I think, allthough I'm not
> completely sure.
>
> For hash4k I'm not sure how it works, this was changed by commit
> e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a
> different page table format")
>
> For the nohash/64, a PGD entry points either to a regular PUD directory
> or to a HUGEPD directory. The size of the HUGEPD directory is encoded in
> the 6 lower bits of the PGD entry.
If it is a software walker there might be value in just aligning to
the contig pte scheme in all levels and forgetting about the variable
size page table levels. That quarter page stuff is a PITA to manage
the memory allocation for on PPC anyhow..
Jason
Powered by blists - more mailing lists