[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d8d7c633-a9e3-4990-8904-4c7710894789@redhat.com>
Date: Tue, 1 Apr 2025 16:56:40 -0400
From: Waiman Long <llong@...hat.com>
To: Waiman Long <llong@...hat.com>, Tejun Heo <tj@...nel.org>
Cc: Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>, Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 01/10] cgroup/cpuset: Fix race between newly created
partition and dying one
On 4/1/25 4:41 PM, Waiman Long wrote:
>
> On 4/1/25 3:59 PM, Tejun Heo wrote:
>> Hello, Waiman.
>>
>> On Mon, Mar 31, 2025 at 11:12:06PM -0400, Waiman Long wrote:
>>> The problem is the RCU delay between the time a cgroup is killed and
>>> is in a
>>> dying state and when the partition is deactivated when
>>> cpuset_css_offline()
>>> is called. That delay can be rather lengthy depending on the current
>>> workload.
>> If we don't have to do it too often, synchronize_rcu_expedited() may be
>> workable too. What do you think?
>
> I don't think we ever call synchronize_rcu() in the cgroup code except
> for rstat flush. In fact, we didn't use to have an easy way to know if
> there were dying cpusets hanging around. Now we can probably use the
> root cgroup's nr_dying_subsys[cpuset_cgrp_id] to know if we need to
> use synchronize_rcu*() call to wait for it. However, I still need to
> check if there is any racing window that will cause us to miss it.
Sorry, I don't think I can use synchronize_rcu_expedited() as the use
cases that I am seeing most often is the creation of isolated partitions
running latency sensitive applications like DPDK. Using
synchronize_rcu_expedited() will send IPIs to all the CPUs which may
break the required latency guarantee for those applications. Just using
synchronize_rcu(), however, will have unpredictable latency impacting
user experience.
>
>>
>>> Another alternative that I can think of is to scan the remote
>>> partition list
>>> for remote partition and sibling cpusets for local partition
>>> whenever some
>>> kind of conflicts are detected when enabling a partition. When a dying
>>> cpuset partition is detected, deactivate it immediately to resolve the
>>> conflict. Otherwise, the dying partition will still be deactivated at
>>> cpuset_css_offline() time.
>>>
>>> That will be a bit more complex and I think can still get the
>>> problem solved
>>> without adding a new method. What do you think? If you are OK with
>>> that, I
>>> will send out a new patch later this week.
>> If synchronize_rcu_expedited() won't do, let's go with the original
>> patch.
>> The operation does make general sense in that it's for a distinctive
>> step in
>> the destruction process although I'm a bit curious why it's called
>> before
>> DYING is set.
>
Because of the above, I still prefer either using the original patch or
scanning for dying cpuset partitions in case a conflict is detected.
Please let me know what you think about it.
Thanks,
Longman
Powered by blists - more mailing lists