linux-kernel - Re: [PATCH v8 1/6] ksm: support unsharing KSM-placed zero pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1925d301-462d-6b33-8867-4e1646b2dbd6@redhat.com>
Date:   Tue, 23 May 2023 11:41:24 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Yang Yang <yang.yang29@....com.cn>, akpm@...ux-foundation.org
Cc:     imbrenda@...ux.ibm.com, jiang.xuexin@....com.cn,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        ran.xiaokai@....com.cn, xu.xin.sc@...il.com, xu.xin16@....com.cn
Subject: Re: [PATCH v8 1/6] ksm: support unsharing KSM-placed zero pages

On 22.05.23 12:49, Yang Yang wrote:
> From: xu xin <xu.xin16@....com.cn>
> 
> When use_zero_pages of ksm is enabled, madvise(addr, len, MADV_UNMERGEABLE)
> and other ways (like write 2 to /sys/kernel/mm/ksm/run) to trigger
> unsharing will *not* actually unshare the shared zeropage as placed by KSM
> (which is against the MADV_UNMERGEABLE documentation). As these KSM-placed
> zero pages are out of the control of KSM, the related counts of ksm pages
> don't expose how many zero pages are placed by KSM (these special zero
> pages are different from those initially mapped zero pages, because the
> zero pages mapped to MADV_UNMERGEABLE areas are expected to be a complete
> and unshared page)
> 
> To not blindly unshare all shared zero_pages in applicable VMAs, the patch
> use pte_mkdirty (related with architecture) to mark KSM-placed zero pages.
> Thus, MADV_UNMERGEABLE will only unshare those KSM-placed zero pages.
> 
> The patch will not degrade the performance of use_zero_pages as it doesn't
> change the way of merging empty pages in use_zero_pages's feature.
> 

Maybe add: "We'll reuse this mechanism to reliably identify KSM-placed 
zeropages to properly account for them (e.g., calculating the KSM profit 
that includes zeropages) next."

> Signed-off-by: xu xin <xu.xin16@....com.cn>
> Suggested-by: David Hildenbrand <david@...hat.com>
> Cc: Claudio Imbrenda <imbrenda@...ux.ibm.com>
> Cc: Xuexin Jiang <jiang.xuexin@....com.cn>
> Reviewed-by: Xiaokai Ran <ran.xiaokai@....com.cn>
> Reviewed-by: Yang Yang <yang.yang29@....com.cn>
> ---
>   include/linux/ksm.h | 6 ++++++
>   mm/ksm.c            | 5 +++--
>   2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index 899a314bc487..7989200cdbb7 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -26,6 +26,9 @@ int ksm_disable(struct mm_struct *mm);
>   
>   int __ksm_enter(struct mm_struct *mm);
>   void __ksm_exit(struct mm_struct *mm);
> +/* use pte_mkdirty to track a KSM-placed zero page */
> +#define set_pte_ksm_zero(pte)	pte_mkdirty(pte_mkspecial(pte))

If there is only a single user (which I assume), please inline it instead.

Let's add some more documentation:

/*
  * To identify zeropages that were mapped by KSM, we reuse the dirty bit
  * in the PTE. If the PTE is dirty, the zeropage was mapped by KSM when
  * deduplicating memory.
  */

> +#define is_ksm_zero_pte(pte)	(is_zero_pfn(pte_pfn(pte)) && pte_dirty(pte))
>   
>   static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
>   {
> @@ -95,6 +98,9 @@ static inline void ksm_exit(struct mm_struct *mm)
>   {
>   }
>   
> +#define set_pte_ksm_zero(pte)	pte_mkspecial(pte)
> +#define is_ksm_zero_pte(pte)	0
> +
>   #ifdef CONFIG_MEMORY_FAILURE
>   static inline void collect_procs_ksm(struct page *page,
>   				     struct list_head *to_kill, int force_early)
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 0156bded3a66..9962f5962afd 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -447,7 +447,8 @@ static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long nex
>   		if (is_migration_entry(entry))
>   			page = pfn_swap_entry_to_page(entry);
>   	}
> -	ret = page && PageKsm(page);
> +	/* return 1 if the page is an normal ksm page or KSM-placed zero page */
> +	ret = (page && PageKsm(page)) || is_ksm_zero_pte(*pte);
>   	pte_unmap_unlock(pte, ptl);
>   	return ret;
>   }
> @@ -1220,7 +1221,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
>   		page_add_anon_rmap(kpage, vma, addr, RMAP_NONE);
>   		newpte = mk_pte(kpage, vma->vm_page_prot);
>   	} else {
> -		newpte = pte_mkspecial(pfn_pte(page_to_pfn(kpage),
> +		newpte = set_pte_ksm_zero(pfn_pte(page_to_pfn(kpage),
>   					       vma->vm_page_prot));
>   		/*
>   		 * We're replacing an anonymous page with a zero page, which is

Apart from that LGTM.

-- 
Thanks,

David / dhildenb