lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5820954.lOV4Wx5bFT@natalenko.name>
Date:   Tue, 10 May 2022 15:30:36 +0200
From:   Oleksandr Natalenko <oleksandr@...alenko.name>
To:     akpm@...ux-foundation.org, cgel.zte@...il.com
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, corbet@....net, xu xin <xu.xin16@....com.cn>,
        Yang Yang <yang.yang29@....com.cn>,
        Ran Xiaokai <ran.xiaokai@....com.cn>,
        wangyong <wang.yong12@....com.cn>,
        Yunkai Zhang <zhang.yunkai@....com.cn>,
        Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH v6] mm/ksm: introduce ksm_force for each process

Hello.

On úterý 10. května 2022 14:22:42 CEST cgel.zte@...il.com wrote:
> From: xu xin <xu.xin16@....com.cn>
> 
> To use KSM, we have to explicitly call madvise() in application code,
> which means installed apps on OS needs to be uninstall and source code
> needs to be modified. It is inconvenient.
> 
> In order to change this situation, We add a new proc file ksm_force
> under /proc/<pid>/ to support turning on/off KSM scanning of a
> process's mm dynamically.
> 
> If ksm_force is set to 1, force all anonymous and 'qualified' VMAs
> of this mm to be involved in KSM scanning without explicitly calling
> madvise to mark VMA as MADV_MERGEABLE. But It is effective only when
> the klob of /sys/kernel/mm/ksm/run is set as 1.
> 
> If ksm_force is set to 0, cancel the feature of ksm_force of this
> process (fallback to the default state) and unmerge those merged pages
> belonging to VMAs which is not madvised as MADV_MERGEABLE of this process,
> but still leave MADV_MERGEABLE areas merged.

To my best knowledge, last time a forcible KSM was discussed (see threads [1], [2], [3] and probably others) it was concluded that a) procfs was a horrible interface for things like this one; and b) process_madvise() syscall was among the best suggested places to implement this (which would require a more tricky handling from userspace, but still).

So, what changed since that discussion?

P.S. For now I do it via dedicated syscall, but I'm not trying to upstream this approach.

[1] https://lore.kernel.org/lkml/2a66abd8-4103-f11b-06d1-07762667eee6@suse.cz/
[2] https://lore.kernel.org/all/20190515145151.GG16651@dhcp22.suse.cz/T/#u
[3] https://lore.kernel.org/lkml/20190516172452.GA2106@avx2/
[4] https://gitlab.com/post-factum/pf-kernel/-/commits/ksm-5.17/

> Signed-off-by: xu xin <xu.xin16@....com.cn>
> Reviewed-by: Yang Yang <yang.yang29@....com.cn>
> Reviewed-by: Ran Xiaokai <ran.xiaokai@....com.cn>
> Reviewed-by: wangyong <wang.yong12@....com.cn>
> Reviewed-by: Yunkai Zhang <zhang.yunkai@....com.cn>
> Suggested-by: Matthew Wilcox <willy@...radead.org>
> ---
> v6:
> - modify the way of "return"
> - remove unnecessary words in Documentation/admin-guide/mm/ksm.rst
> - add additional notes to "set 0 to ksm_force" in Documentation/../ksm.rst
> and Documentation/../proc.rst
> v5:
> - fix typos in Documentation/filesystem/proc.rst
> v4:
> - fix typos in commit log
> - add interface descriptions under Documentation/
> v3:
> - fix compile error of mm/ksm.c
> v2:
> - fix a spelling error in commit log.
> - remove a redundant condition check in ksm_force_write().
> ---
>  Documentation/admin-guide/mm/ksm.rst | 19 +++++-
>  Documentation/filesystems/proc.rst   | 17 +++++
>  fs/proc/base.c                       | 93 ++++++++++++++++++++++++++++
>  include/linux/mm_types.h             |  9 +++
>  mm/ksm.c                             | 32 +++++++++-
>  5 files changed, 167 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
> index b244f0202a03..8cabc2504005 100644
> --- a/Documentation/admin-guide/mm/ksm.rst
> +++ b/Documentation/admin-guide/mm/ksm.rst
> @@ -32,7 +32,7 @@ are swapped back in: ksmd must rediscover their identity and merge again).
>  Controlling KSM with madvise
>  ============================
>  
> -KSM only operates on those areas of address space which an application
> +KSM can operates on those areas of address space which an application
>  has advised to be likely candidates for merging, by using the madvise(2)
>  system call::
>  
> @@ -70,6 +70,23 @@ Applications should be considerate in their use of MADV_MERGEABLE,
>  restricting its use to areas likely to benefit.  KSM's scans may use a lot
>  of processing power: some installations will disable KSM for that reason.
>  
> +Controlling KSM with procfs
> +===========================
> +
> +KSM can also operate on anonymous areas of address space of those processes's
> +knob ``/proc/<pid>/ksm_force`` is on, even if app codes doesn't call madvise()
> +explicitly to advise specific areas as MADV_MERGEABLE.
> +
> +You can set ksm_force to 1 to force all anonymous and qualified VMAs of
> +this process to be involved in KSM scanning.
> +	e.g. ``echo 1 > /proc/<pid>/ksm_force``
> +
> +You can also set ksm_force to 0 to cancel that force feature of this process
> +and unmerge those merged pages which belongs to those VMAs not marked as
> +MADV_MERGEABLE of this process. But that still leave those pages belonging to
> +VMAs marked as MADV_MERGEABLE merged (fallback to the default state).
> +	e.g. ``echo 0 > /proc/<pid>/ksm_force``
> +
>  .. _ksm_sysfs:
>  
>  KSM daemon sysfs interface
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 061744c436d9..8890b8b457a4 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -47,6 +47,7 @@ fixes/update part 1.1  Stefani Seibold <stefani@...bold.net>    June 9 2009
>    3.10  /proc/<pid>/timerslack_ns - Task timerslack value
>    3.11	/proc/<pid>/patch_state - Livepatch patch operation state
>    3.12	/proc/<pid>/arch_status - Task architecture specific information
> +  3.13	/proc/<pid>/ksm_force - Setting of mandatory involvement in KSM
>  
>    4	Configuring procfs
>    4.1	Mount options
> @@ -2176,6 +2177,22 @@ AVX512_elapsed_ms
>    the task is unlikely an AVX512 user, but depends on the workload and the
>    scheduling scenario, it also could be a false negative mentioned above.
>  
> +3.13	/proc/<pid>/ksm_force - Setting of mandatory involvement in KSM
> +-----------------------------------------------------------------------
> +When CONFIG_KSM is enabled, this file can be used to specify if this
> +process's anonymous memory can be involved in KSM scanning without app codes
> +explicitly calling madvise to mark memory address as MADV_MERGEABLE.
> +
> +If writing 1 to this file, the kernel will force all anonymous and qualified
> +memory to be involved in KSM scanning without explicitly calling madvise to
> +mark memory address as MADV_MERGEABLE. But that is effective only when the
> +klob of '/sys/kernel/mm/ksm/run' is set as 1.
> +
> +If writing 0 to this file, the mandatory KSM feature of this process's will
> +be cancelled and unmerge those merged pages which belongs to those areas not
> +marked as MADV_MERGEABLE of this process, but leave those pages belonging to
> +areas marked as MADV_MERGEABLE merged (fallback to the default state).
> +
>  Chapter 4: Configuring procfs
>  =============================
>  
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 8dfa36a99c74..d60f7342f79e 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -96,6 +96,7 @@
>  #include <linux/time_namespace.h>
>  #include <linux/resctrl.h>
>  #include <linux/cn_proc.h>
> +#include <linux/ksm.h>
>  #include <trace/events/oom.h>
>  #include "internal.h"
>  #include "fd.h"
> @@ -3168,6 +3169,96 @@ static int proc_pid_ksm_merging_pages(struct seq_file *m, struct pid_namespace *
>  
>  	return 0;
>  }
> +
> +static ssize_t ksm_force_read(struct file *file, char __user *buf, size_t count,
> +				loff_t *ppos)
> +{
> +	struct task_struct *task;
> +	struct mm_struct *mm;
> +	char buffer[PROC_NUMBUF];
> +	ssize_t len;
> +	int ret;
> +
> +	task = get_proc_task(file_inode(file));
> +	if (!task)
> +		return -ESRCH;
> +
> +	mm = get_task_mm(task);
> +	ret = 0;
> +	if (mm) {
> +		len = snprintf(buffer, sizeof(buffer), "%d\n", mm->ksm_force);
> +		ret =  simple_read_from_buffer(buf, count, ppos, buffer, len);
> +		mmput(mm);
> +	}
> +
> +	return ret;
> +}
> +
> +static ssize_t ksm_force_write(struct file *file, const char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	struct task_struct *task;
> +	struct mm_struct *mm;
> +	char buffer[PROC_NUMBUF];
> +	int force;
> +	int err = 0;
> +
> +	memset(buffer, 0, sizeof(buffer));
> +	if (count > sizeof(buffer) - 1)
> +		count = sizeof(buffer) - 1;
> +	if (copy_from_user(buffer, buf, count))
> +		return -EFAULT;
> +
> +	err = kstrtoint(strstrip(buffer), 0, &force);
> +	if (err)
> +		return err;
> +
> +	if (force != 0 && force != 1)
> +		return -EINVAL;
> +
> +	task = get_proc_task(file_inode(file));
> +	if (!task)
> +		return -ESRCH;
> +
> +	mm = get_task_mm(task);
> +	if (!mm)
> +		goto out_put_task;
> +
> +	if (mm->ksm_force != force) {
> +		if (mmap_write_lock_killable(mm)) {
> +			err = -EINTR;
> +			goto out_mmput;
> +		}
> +
> +		if (force == 0)
> +			mm->ksm_force = force;
> +		else {
> +			/*
> +			 * Force anonymous pages of this mm to be involved in KSM merging
> +			 * without explicitly calling madvise.
> +			 */
> +			if (!test_bit(MMF_VM_MERGEABLE, &mm->flags))
> +				err = __ksm_enter(mm);
> +			if (!err)
> +				mm->ksm_force = force;
> +		}
> +
> +		mmap_write_unlock(mm);
> +	}
> +
> +out_mmput:
> +	mmput(mm);
> +out_put_task:
> +	put_task_struct(task);
> +
> +	return err < 0 ? err : count;
> +}
> +
> +static const struct file_operations proc_pid_ksm_force_operations = {
> +	.read		= ksm_force_read,
> +	.write		= ksm_force_write,
> +	.llseek		= generic_file_llseek,
> +};
>  #endif /* CONFIG_KSM */
>  
>  #ifdef CONFIG_STACKLEAK_METRICS
> @@ -3303,6 +3394,7 @@ static const struct pid_entry tgid_base_stuff[] = {
>  #endif
>  #ifdef CONFIG_KSM
>  	ONE("ksm_merging_pages",  S_IRUSR, proc_pid_ksm_merging_pages),
> +	REG("ksm_force", S_IRUSR|S_IWUSR, proc_pid_ksm_force_operations),
>  #endif
>  };
>  
> @@ -3639,6 +3731,7 @@ static const struct pid_entry tid_base_stuff[] = {
>  #endif
>  #ifdef CONFIG_KSM
>  	ONE("ksm_merging_pages",  S_IRUSR, proc_pid_ksm_merging_pages),
> +	REG("ksm_force", S_IRUSR|S_IWUSR, proc_pid_ksm_force_operations),
>  #endif
>  };
>  
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index b34ff2cdbc4f..1b1592c2f5cf 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -661,6 +661,15 @@ struct mm_struct {
>  		 * merging.
>  		 */
>  		unsigned long ksm_merging_pages;
> +		/*
> +		 * If true, force anonymous pages of this mm to be involved in KSM
> +		 * merging without explicitly calling madvise. It is effctive only
> +		 * when the klob of '/sys/kernel/mm/ksm/run' is set as 1. If false,
> +		 * cancel the feature of ksm_force of this process and unmerge
> +		 * those merged pages which is not madvised as MERGEABLE of this
> +		 * process, but leave MERGEABLE areas merged.
> +		 */
> +		bool ksm_force;
>  #endif
>  	} __randomize_layout;
>  
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 38360285497a..c9f672dcc72e 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -334,6 +334,34 @@ static void __init ksm_slab_free(void)
>  	mm_slot_cache = NULL;
>  }
>  
> +/* Check if vma is qualified for ksmd scanning */
> +static bool ksm_vma_check(struct vm_area_struct *vma)
> +{
> +	unsigned long vm_flags = vma->vm_flags;
> +
> +	if (!(vma->vm_flags & VM_MERGEABLE) && !(vma->vm_mm->ksm_force))
> +		return false;
> +
> +	if (vm_flags & (VM_SHARED	| VM_MAYSHARE	|
> +			VM_PFNMAP	| VM_IO | VM_DONTEXPAND |
> +			VM_HUGETLB	| VM_MIXEDMAP))
> +		return false;       /* just ignore this vma*/
> +
> +	if (vma_is_dax(vma))
> +		return false;
> +
> +#ifdef VM_SAO
> +	if (vm_flags & VM_SAO)
> +		return false;
> +#endif
> +#ifdef VM_SPARC_ADI
> +	if (vm_flags & VM_SPARC_ADI)
> +		return false;
> +#endif
> +
> +	return true;
> +}
> +
>  static __always_inline bool is_stable_node_chain(struct stable_node *chain)
>  {
>  	return chain->rmap_hlist_len == STABLE_NODE_CHAIN;
> @@ -523,7 +551,7 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
>  	if (ksm_test_exit(mm))
>  		return NULL;
>  	vma = vma_lookup(mm, addr);
> -	if (!vma || !(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma)
> +	if (!vma || !ksm_vma_check(vma) || !vma->anon_vma)
>  		return NULL;
>  	return vma;
>  }
> @@ -2297,7 +2325,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
>  		vma = find_vma(mm, ksm_scan.address);
>  
>  	for (; vma; vma = vma->vm_next) {
> -		if (!(vma->vm_flags & VM_MERGEABLE))
> +		if (!ksm_vma_check(vma))
>  			continue;
>  		if (ksm_scan.address < vma->vm_start)
>  			ksm_scan.address = vma->vm_start;
> 


-- 
Oleksandr Natalenko (post-factum)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ