linux-kernel - Re: [RFC PATCH v3 05/37] mm/{mprotect,memory}: (no upstream-aimed hack) implement MM_CP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3b7ff190-4efe-47d0-82fb-68135a031b0f@kernel.org>
Date: Mon, 8 Dec 2025 12:19:41 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: SeongJae Park <sj@...nel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
 Andrew Morton <akpm@...ux-foundation.org>, Jann Horn <jannh@...gle.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Michal Hocko
 <mhocko@...e.com>, Mike Rapoport <rppt@...nel.org>,
 Pedro Falcato <pfalcato@...e.de>, Suren Baghdasaryan <surenb@...gle.com>,
 Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Subject: Re: [RFC PATCH v3 05/37] mm/{mprotect,memory}: (no upstream-aimed
 hack) implement MM_CP_DAMON

On 12/8/25 07:29, SeongJae Park wrote:
> Note that this is not upstreamable as-is.  This is only for helping
> discussion of other changes of its series.
> 
> DAMON is using Accessed bits of page table entries as the major source
> of the access information.  It lacks some additional information such as
> which CPU was making the access.  Page faults could be another source of
> information for such additional information.
> 
> Implement another change_protection() flag for such use cases, namely
> MM_CP_DAMON.  DAMON will install PAGE_NONE protections using the flag.
> To avoid interfering with NUMA_BALANCING, which is also using PAGE_NON
> protection, pass the faults to DAMON only when NUMA_BALANCING is
> disabled.
> 
> Again, this is not upstreamable as-is.  There were comments about this
> on the previous version, and I was unable to take time on addressing
> those.  As a result, this version is not addressing any of those
> previous comments.  I'm sending this, though, to help discussions on
> patches of its series, except this one.  Please forgive me adding this
> to your inbox without addressing your comments, and ignore.  I will
> establish another discussion for this part later.
> 
> Signed-off-by: SeongJae Park <sj@...nel.org>
> ---
>   include/linux/mm.h |  1 +
>   mm/memory.c        | 60 ++++++++++++++++++++++++++++++++++++++++++++--
>   mm/mprotect.c      |  5 ++++
>   3 files changed, 64 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 553cf9f438f1..2cba5a0196da 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2848,6 +2848,7 @@ int get_cmdline(struct task_struct *task, char *buffer, int buflen);
>   #define  MM_CP_UFFD_WP_RESOLVE             (1UL << 3) /* Resolve wp */
>   #define  MM_CP_UFFD_WP_ALL                 (MM_CP_UFFD_WP | \
>   					    MM_CP_UFFD_WP_RESOLVE)
> +#define MM_CP_DAMON                        (1UL << 4)
>   
>   bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
>   			     pte_t pte);
> diff --git a/mm/memory.c b/mm/memory.c
> index 6675e87eb7dd..5dc85adb1e59 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -78,6 +78,7 @@
>   #include <linux/sched/sysctl.h>
>   #include <linux/pgalloc.h>
>   #include <linux/uaccess.h>
> +#include <linux/damon.h>
>   
>   #include <trace/events/kmem.h>
>   
> @@ -6172,6 +6173,54 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
>   	return VM_FAULT_FALLBACK;
>   }
>   
> +/*
> + * NOTE: This is only poc purpose "hack" that will not be upstreamed as is.
> + * More discussions between all stakeholders including maintainers of MM core,
> + * NUMA balancing, and DAMON should be made to make this upstreamable.
> + * (https://lore.kernel.org/20251128193947.80866-1-sj@kernel.org)
> + *
> + * This function is called from page fault handler, for page faults on
> + * P{TE,MD}-protected but vma-accessible pages.  DAMON is making the fake
> + * protection for access sampling purpose.  This function simply clear the
> + * protection and report this access to DAMON, by calling
> + * damon_report_page_fault().
> + *
> + * The protection clear code is copied from NUMA fault handling code for PTE.
> + * Again, this is only poc purpose "hack" to show what information DAMON want
> + * from page fault events, rather than an upstream-aimed version.
> + */
> +static vm_fault_t do_damon_page(struct vm_fault *vmf, bool huge_pmd)
> +{
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct folio *folio;
> +	pte_t pte, old_pte;
> +	bool writable = false, ignore_writable = false;
> +	bool pte_write_upgrade = vma_wants_manual_pte_write_upgrade(vma);
> +
> +	spin_lock(vmf->ptl);
> +	old_pte = ptep_get(vmf->pte);
> +	if (unlikely(!pte_same(old_pte, vmf->orig_pte))) {
> +		pte_unmap_unlock(vmf->pte, vmf->ptl);
> +		return 0;
> +	}
> +	pte = pte_modify(old_pte, vma->vm_page_prot);
> +	writable = pte_write(pte);
> +	if (!writable && pte_write_upgrade &&
> +			can_change_pte_writable(vma, vmf->address, pte))
> +		writable = true;
> +	folio = vm_normal_folio(vma, vmf->address, pte);
> +	if (folio && folio_test_large(folio))
> +		numa_rebuild_large_mapping(vmf, vma, folio, pte,
> +				ignore_writable, pte_write_upgrade);
> +	else
> +		numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte,
> +				writable);
> +	pte_unmap_unlock(vmf->pte, vmf->ptl);
> +
> +	damon_report_page_fault(vmf, huge_pmd);
> +	return 0;
> +}
> +
>   /*
>    * These routines also need to handle stuff like marking pages dirty
>    * and/or accessed for architectures that don't do it in hardware (most
> @@ -6236,8 +6285,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>   	if (!pte_present(vmf->orig_pte))
>   		return do_swap_page(vmf);
>   
> -	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
> +	if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) {
> +		if (sysctl_numa_balancing_mode == NUMA_BALANCING_DISABLED)
> +			return do_damon_page(vmf, false);
>   		return do_numa_page(vmf);
> +	}
>   
>   	spin_lock(vmf->ptl);
>   	entry = vmf->orig_pte;
> @@ -6363,8 +6415,12 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
>   		return 0;
>   	}
>   	if (pmd_trans_huge(vmf.orig_pmd)) {
> -		if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma))
> +		if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) {
> +			if (sysctl_numa_balancing_mode ==
> +					NUMA_BALANCING_DISABLED)
> +				return do_damon_page(&vmf, true);
>   			return do_huge_pmd_numa_page(&vmf);
> +		}

I recall that we had a similar discussion already. Ah, it was around 
some arm MTE tag storage reuse [1].

The idea was to let do_*_numa_page() handle the restoring so we don't 
end up with such duplicated code.

[1] 
https://lore.kernel.org/all/20240125164256.4147-1-alexandru.elisei@arm.com/

-- 
Cheers

David