[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-shjD2OwHJPI0vG@slm.duckdns.org>
Date: Mon, 31 Mar 2025 13:13:16 -1000
From: Tejun Heo <tj@...nel.org>
To: Waiman Long <longman@...hat.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
Michal Koutný <mkoutny@...e.com>,
Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 01/10] cgroup/cpuset: Fix race between newly created
partition and dying one
Hello,
On Sun, Mar 30, 2025 at 05:52:39PM -0400, Waiman Long wrote:
...
> One possible way to fix this is to iterate the dying cpusets as well and
> avoid using the exclusive CPUs in those dying cpusets. However, this
> can still cause random partition creation failures or other anomalies
> due to racing. A better way to fix this race is to reset the partition
> state at the moment when a cpuset is being killed.
I'm not a big fan of adding another method call in the destruction path.
css_offline() is where the kill can be seen from all CPUs and notified to
the controller and I'm not sure why bringing it sooner would be necessary to
close the race window. Can't the creation side drain the cgroups that are
going down if the asynchronous part is a problem? e.g. We already have
cgroup_lock_and_drain_offline() which isn't the most scalable thing but
partition operations aren't very frequent, right? And if that's a problem,
there should be a way to make it reasonably quicker.
Thanks.
--
tejun
Powered by blists - more mailing lists