[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <915d1261-ee9f-4080-a338-775982e1c48d@redhat.com>
Date: Mon, 31 Mar 2025 23:12:06 -0400
From: Waiman Long <llong@...hat.com>
To: Tejun Heo <tj@...nel.org>
Cc: Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>, Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 01/10] cgroup/cpuset: Fix race between newly created
partition and dying one
On 3/31/25 7:13 PM, Tejun Heo wrote:
> Hello,
>
> On Sun, Mar 30, 2025 at 05:52:39PM -0400, Waiman Long wrote:
> ...
>> One possible way to fix this is to iterate the dying cpusets as well and
>> avoid using the exclusive CPUs in those dying cpusets. However, this
>> can still cause random partition creation failures or other anomalies
>> due to racing. A better way to fix this race is to reset the partition
>> state at the moment when a cpuset is being killed.
> I'm not a big fan of adding another method call in the destruction path.
> css_offline() is where the kill can be seen from all CPUs and notified to
> the controller and I'm not sure why bringing it sooner would be necessary to
> close the race window. Can't the creation side drain the cgroups that are
> going down if the asynchronous part is a problem? e.g. We already have
> cgroup_lock_and_drain_offline() which isn't the most scalable thing but
> partition operations aren't very frequent, right? And if that's a problem,
> there should be a way to make it reasonably quicker.
The problem is the RCU delay between the time a cgroup is killed and is
in a dying state and when the partition is deactivated when
cpuset_css_offline() is called. That delay can be rather lengthy
depending on the current workload.
Another alternative that I can think of is to scan the remote partition
list for remote partition and sibling cpusets for local partition
whenever some kind of conflicts are detected when enabling a partition.
When a dying cpuset partition is detected, deactivate it immediately to
resolve the conflict. Otherwise, the dying partition will still be
deactivated at cpuset_css_offline() time.
That will be a bit more complex and I think can still get the problem
solved without adding a new method. What do you think? If you are OK
with that, I will send out a new patch later this week.
Thanks,
Longman
Powered by blists - more mailing lists