lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b4360a0-42ef-45ab-9e63-eace52943529@lucifer.local>
Date: Fri, 30 May 2025 12:14:31 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Dev Jain <dev.jain@....com>
Cc: akpm@...ux-foundation.org, david@...hat.com, catalin.marinas@....com,
        will@...nel.org, Liam.Howlett@...cle.com, vbabka@...e.cz,
        rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        suzuki.poulose@....com, steven.price@....com, gshan@...hat.com,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 3/3] mm/pagewalk: Add pre/post_pte_table callback for
 lazy MMU on arm64

On Fri, May 30, 2025 at 02:34:07PM +0530, Dev Jain wrote:
> arm64 implements lazy_mmu_mode to allow deferral and batching of barriers
> when updating kernel PTEs, which provides a nice performance boost. arm64
> currently uses apply_to_page_range() to modify kernel PTE permissions,
> which runs inside lazy_mmu_mode. So to prevent a performance regression,
> let's add hooks to walk_page_range_novma() to allow continued use of
> lazy_mmu_mode.
>
> Signed-off-by: Dev Jain <dev.jain@....com>
> ---
> Credits to Ryan for the patch description.
>
>  arch/arm64/mm/pageattr.c | 12 ++++++++++++
>  include/linux/pagewalk.h |  2 ++
>  mm/pagewalk.c            |  6 ++++++
>  3 files changed, 20 insertions(+)
>
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index a5c829c64969..9163324b12a0 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -75,11 +75,23 @@ static int pageattr_pte_entry(pte_t *pte, unsigned long addr,
>  	return 0;
>  }
>
> +static void pte_lazy_mmu_enter(void)
> +{
> +	arch_enter_lazy_mmu_mode();
> +}

Hm am I missing something? I don't see this function or the leave version
defined in arch/arm64?

No do I see __HAVE_ARCH_ENTER_LAZY_MMU_MODE?

> +
> +static void pte_lazy_mmu_leave(void)
> +{
> +	arch_leave_lazy_mmu_mode();
> +}

Are you absolutely sure you will never need to hook this stuff on higher level
page tables?

If this relates to vmalloc, then we do have huge page mappings in vmalloc logic?

> +
>  static const struct mm_walk_ops pageattr_ops = {
>  	.pud_entry	= pageattr_pud_entry,
>  	.pmd_entry	= pageattr_pmd_entry,
>  	.pte_entry	= pageattr_pte_entry,
>  	.walk_lock	= PGWALK_NOLOCK,
> +	.pre_pte_table	= pte_lazy_mmu_enter,
> +	.post_pte_table	= pte_lazy_mmu_leave,

This is kind of horrid really, are we sure the lazy mmu mode is valid for
everything that occurs within the the loop? I suppose it's only simple logic for
the ops->pte_entry stuff.

But it feels like hacking something in for this specific case.

At the same time I don't want to get in the way of an optimisation. We could do
something in ops->pmd_entry, but then we'd not get to turn it off afterwards...

Same for any higher level page table hm.

Is this really the only way to get this? I guess it's not feasible having this
just switched on for the whole operation...

I just fear that we could end up populating these mm_walk_ops with every corner
case thing we think of.

>  };
>
>  bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED);
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index 9bc8853ed3de..2157d345974c 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -88,6 +88,8 @@ struct mm_walk_ops {
>  	int (*pre_vma)(unsigned long start, unsigned long end,
>  		       struct mm_walk *walk);
>  	void (*post_vma)(struct mm_walk *walk);
> +	void (*pre_pte_table)(void);
> +	void (*post_pte_table)(void);
>  	int (*install_pte)(unsigned long addr, unsigned long next,
>  			   pte_t *ptep, struct mm_walk *walk);
>  	enum page_walk_lock walk_lock;
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 9657cf4664b2..a441f5cbbc45 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -33,6 +33,9 @@ static int walk_pte_range_inner(pte_t *pte, unsigned long addr,
>  	const struct mm_walk_ops *ops = walk->ops;
>  	int err = 0;
>
> +	if (walk->ops->pre_pte_table)
> +		walk->ops->pre_pte_table();

NIT: you have 'ops' already, no need for walk-> :)

> +
>  	for (;;) {
>  		if (ops->install_pte && pte_none(ptep_get(pte))) {
>  			pte_t new_pte;
> @@ -56,6 +59,9 @@ static int walk_pte_range_inner(pte_t *pte, unsigned long addr,
>  		addr += PAGE_SIZE;
>  		pte++;
>  	}
> +
> +	if (walk->ops->post_pte_table)
> +		walk->ops->post_pte_table();

NIT: same as above.

>  	return err;
>  }
>
> --
> 2.30.2
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ