lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-zsGazxeHK9uaA6@slm.duckdns.org>
Date: Tue, 1 Apr 2025 21:49:45 -1000
From: Tejun Heo <tj@...nel.org>
To: Waiman Long <longman@...hat.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
	Michal Koutný <mkoutny@...e.com>,
	Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 01/10] cgroup/cpuset: Fix race between newly created
 partition and dying one

On Sun, Mar 30, 2025 at 05:52:39PM -0400, Waiman Long wrote:
> There is a possible race between removing a cgroup diectory that is
> a partition root and the creation of a new partition.  The partition
> to be removed can be dying but still online, it doesn't not currently
> participate in checking for exclusive CPUs conflict, but the exclusive
> CPUs are still there in subpartitions_cpus and isolated_cpus. These
> two cpumasks are global states that affect the operation of cpuset
> partitions. The exclusive CPUs in dying cpusets will only be removed
> when cpuset_css_offline() function is called after an RCU delay.
> 
> As a result, it is possible that a new partition can be created with
> exclusive CPUs that overlap with those of a dying one. When that dying
> partition is finally offlined, it removes those overlapping exclusive
> CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an
> incorrect CPU configuration.
> 
> This bug was found when a warning was triggered in
> remote_partition_disable() during testing because the subpartitions_cpus
> mask was empty.
> 
> One possible way to fix this is to iterate the dying cpusets as well and
> avoid using the exclusive CPUs in those dying cpusets. However, this
> can still cause random partition creation failures or other anomalies
> due to racing. A better way to fix this race is to reset the partition
> state at the moment when a cpuset is being killed.
> 
> Introduce a new css_killed() CSS function pointer and call it, if
> defined, before setting CSS_DYING flag in kill_css(). Also update the
> css_is_dying() helper to use the CSS_DYING flag introduced by commit
> 33c35aa48178 ("cgroup: Prevent kill_css() from being called more than
> once") for proper synchronization.
> 
> Add a new cpuset_css_killed() function to reset the partition state of
> a valid partition root if it is being killed.
> 
> Fixes: ee8dde0cd2ce ("cpuset: Add new v2 cpuset.sched.partition flag")
> Signed-off-by: Waiman Long <longman@...hat.com>

Applied to cgroup/for-6.15-fixes.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ