lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZlbsEb_T2eQYO-g4@localhost.localdomain>
Date: Wed, 29 May 2024 10:49:21 +0200
From: Oscar Salvador <osalvador@...e.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Jason Gunthorpe <jgg@...dia.com>, Peter Xu <peterx@...hat.com>,
	Michael Ellerman <mpe@...erman.id.au>,
	Nicholas Piggin <npiggin@...il.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [RFC PATCH v4 13/16] powerpc/e500: Use contiguous PMD instead of
 hugepd

On Mon, May 27, 2024 at 03:30:11PM +0200, Christophe Leroy wrote:
> e500 supports many page sizes among which the following size are
> implemented in the kernel at the time being: 4M, 16M, 64M, 256M, 1G.
> 
> On e500, TLB miss for hugepages is exclusively handled by SW even
> on e6500 which has HW assistance for 4k pages, so there are no
> constraints like on the 8xx.
> 
> On e500/32, all are at PGD/PMD level and can be handled as
> cont-PMD.
> 
> On e500/64, smaller ones are on PMD while bigger ones are on PUD.
> Again, they can easily be handled as cont-PMD and cont-PUD instead
> of hugepd.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@...roup.eu>

..

> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
> index 90d6a0943b35..f7421d1a1693 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -52,11 +52,36 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, p
>  {
>  	pte_basic_t old = pte_val(*p);
>  	pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
> +	unsigned long sz;
> +	unsigned long pdsize;
> +	int i;
>  
>  	if (new == old)
>  		return old;
>  
> -	*p = __pte(new);
> +#ifdef CONFIG_PPC_E500
> +	if (huge)
> +		sz = 1UL << (((old & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 20);
> +	else

I think this will not compile when CONFIG_PPC_85xx && !CONFIG_PTE_64BIT.

You have declared _PAGE_HSIZE_MSK and _PAGE_HSIZE_SHIFT in
arch/powerpc/include/asm/nohash/hugetlb-e500.h.

But hugetlb-e500.h is only included if CONFIG_PPC_85xx && CONFIG_PTE_64BIT
(see arch/powerpc/include/asm/nohash/32/pgtable.h).



> +#endif
> +		sz = PAGE_SIZE;
> +
> +	if (!huge || sz < PMD_SIZE)
> +		pdsize = PAGE_SIZE;
> +	else if (sz < PUD_SIZE)
> +		pdsize = PMD_SIZE;
> +	else if (sz < P4D_SIZE)
> +		pdsize = PUD_SIZE;
> +	else if (sz < PGDIR_SIZE)
> +		pdsize = P4D_SIZE;
> +	else
> +		pdsize = PGDIR_SIZE;
> +
> +	for (i = 0; i < sz / pdsize; i++, p++) {
> +		*p = __pte(new);
> +		if (new)
> +			new += (unsigned long long)(pdsize / PAGE_SIZE) << PTE_RPN_SHIFT;

I guess 'new' can be 0 if pte_update() is called on behave of clearing the pte?

> +static inline unsigned long pmd_leaf_size(pmd_t pmd)
> +{
> +	return 1UL << (((pmd_val(pmd) & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 20);

Can we have the '20' somewhere defined with a comment on top explaining
what is so it is not a magic number?
Otherwise people might come look at this and wonder why 20.

> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -331,6 +331,37 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>  		__set_huge_pte_at(pmdp, ptep, pte_val(pte));
>  	}
>  }
> +#elif defined(CONFIG_PPC_E500)
> +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> +		     pte_t pte, unsigned long sz)
> +{
> +	unsigned long pdsize;
> +	int i;
> +
> +	pte = set_pte_filter(pte, addr);
> +
> +	/*
> +	 * Make sure hardware valid bit is not set. We don't do
> +	 * tlb flush for this update.
> +	 */
> +	VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
> +
> +	if (sz < PMD_SIZE)
> +		pdsize = PAGE_SIZE;
> +	else if (sz < PUD_SIZE)
> +		pdsize = PMD_SIZE;
> +	else if (sz < P4D_SIZE)
> +		pdsize = PUD_SIZE;
> +	else if (sz < PGDIR_SIZE)
> +		pdsize = P4D_SIZE;
> +	else
> +		pdsize = PGDIR_SIZE;
> +
> +	for (i = 0; i < sz / pdsize; i++, ptep++, addr += pdsize) {
> +		__set_pte_at(mm, addr, ptep, pte, 0);
> +		pte = __pte(pte_val(pte) + ((unsigned long long)pdsize / PAGE_SIZE << PFN_PTE_SHIFT));

You can use pte_advance_pfn() here? Just give have

 nr = (unsigned long long)pdsize / PAGE_SIZE << PFN_PTE_SHIFT)
 pte_advance_pfn(pte, nr)

Which 'sz's can we have here? You mentioned that e500 support:

4M, 16M, 64M, 256M, 1G.

which of these ones can be huge?


-- 
Oscar Salvador
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ