lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b978ec91-306f-45d5-8d88-91febebb8e48@redhat.com>
Date: Tue, 8 Apr 2025 14:36:16 -0400
From: Waiman Long <llong@...hat.com>
To: Gregory Price <gourry@...rry.net>, linux-mm@...ck.org
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, tj@...nel.org, hannes@...xchg.org, mkoutny@...e.com,
 akpm@...ux-foundation.org
Subject: Re: [RFC PATCH] vmscan,cgroup: apply mems_effective to reclaim

On 3/20/25 5:09 PM, Gregory Price wrote:
> It is possible for a reclaimer to cause demotions of an lruvec belonging
> to a cgroup with cpuset.mems set to exclude some nodes. Attempt to apply
> this limitation based on the lruvec's memcg and prevent demotion.
>
> Notably, this may still allow demotion of shared libraries or any memory
> first instantiated in another cgroup. This means cpusets still cannot
> cannot guarantee complete isolation when demotion is enabled, and the
> docs have been updated to reflect this.
>
>
> Note: This is a fairly hacked up method that probably overlooks some
>        cgroup/cpuset controls or designs. RFCing now for some discussion
>        at LSFMM '25.
>
>
> Signed-off-by: Gregory Price <gourry@...rry.net>
> ---
>   .../ABI/testing/sysfs-kernel-mm-numa          | 14 +++++---
>   include/linux/cpuset.h                        |  2 ++
>   kernel/cgroup/cpuset.c                        | 10 ++++++
>   mm/vmscan.c                                   | 32 ++++++++++++-------
>   4 files changed, 41 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-numa b/Documentation/ABI/testing/sysfs-kernel-mm-numa
> index 77e559d4ed80..27cdcab901f7 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-mm-numa
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-numa
> @@ -16,9 +16,13 @@ Description:	Enable/disable demoting pages during reclaim
>   		Allowing page migration during reclaim enables these
>   		systems to migrate pages from fast tiers to slow tiers
>   		when the fast tier is under pressure.  This migration
> -		is performed before swap.  It may move data to a NUMA
> -		node that does not fall into the cpuset of the
> -		allocating process which might be construed to violate
> -		the guarantees of cpusets.  This should not be enabled
> -		on systems which need strict cpuset location
> +		is performed before swap if an eligible numa node is
> +		present in cpuset.mems for the cgroup. If cpusets.mems
> +		changes at runtime, it may move data to a NUMA node that
> +		does not fall into the cpuset of the new cpusets.mems,
> +		which might be construed to violate the guarantees of
> +		cpusets.  Shared memory, such as libraries, owned by
> +		another cgroup may still be demoted and result in memory
> +		use on a node not present in cpusets.mem. This should not
> +		be enabled on systems which need strict cpuset location
>   		guarantees.
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index 835e7b793f6a..d4169f1b1719 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -171,6 +171,8 @@ static inline void set_mems_allowed(nodemask_t nodemask)
>   	task_unlock(current);
>   }
>   
> +bool memcg_mems_allowed(struct mem_cgroup *memcg, int nid);
> +
>   #else /* !CONFIG_CPUSETS */
>   
You should also define an inline function for the !CONFIG_CPUSETS case.
>   static inline bool cpusets_enabled(void) { return false; }
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 0f910c828973..bb9669cc105d 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -4296,3 +4296,13 @@ void cpuset_task_status_allowed(struct seq_file *m, struct task_struct *task)
>   	seq_printf(m, "Mems_allowed_list:\t%*pbl\n",
>   		   nodemask_pr_args(&task->mems_allowed));
>   }
> +
> +bool memcg_mems_allowed(struct mem_cgroup *memcg, int nid)
> +{
> +	struct cgroup_subsys_state *css;
> +	struct cpuset *cs;
> +
> +	css = cgroup_get_e_css(memcg->css.cgroup, &cpuset_cgrp_subsys);
> +	cs = css ? container_of(css, struct cpuset, css) : NULL;
> +	return cs ? node_isset(nid, cs->effective_mems) : true;

As said by Johannes, you will need to take the callback_lock to ensure 
the stability of effective_mems. I also second his suggestion of 
defining a cgroup_mems_allowed() here and do the the memcg to cgroup 
translation outside of cpuset.c.

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ