lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9c96490-98bf-406b-8324-6cf86a536433@redhat.com>
Date: Tue, 1 Apr 2025 16:41:55 -0400
From: Waiman Long <llong@...hat.com>
To: Tejun Heo <tj@...nel.org>, Waiman Long <llong@...hat.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Michal Koutný
 <mkoutny@...e.com>, Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 01/10] cgroup/cpuset: Fix race between newly created
 partition and dying one


On 4/1/25 3:59 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Mon, Mar 31, 2025 at 11:12:06PM -0400, Waiman Long wrote:
>> The problem is the RCU delay between the time a cgroup is killed and is in a
>> dying state and when the partition is deactivated when cpuset_css_offline()
>> is called. That delay can be rather lengthy depending on the current
>> workload.
> If we don't have to do it too often, synchronize_rcu_expedited() may be
> workable too. What do you think?

I don't think we ever call synchronize_rcu() in the cgroup code except 
for rstat flush. In fact, we didn't use to have an easy way to know if 
there were dying cpusets hanging around. Now we can probably use the 
root cgroup's nr_dying_subsys[cpuset_cgrp_id] to know if we need to use 
synchronize_rcu*() call to wait for it. However, I still need to check 
if there is any racing window that will cause us to miss it.

>
>> Another alternative that I can think of is to scan the remote partition list
>> for remote partition and sibling cpusets for local partition whenever some
>> kind of conflicts are detected when enabling a partition. When a dying
>> cpuset partition is detected, deactivate it immediately to resolve the
>> conflict. Otherwise, the dying partition will still be deactivated at
>> cpuset_css_offline() time.
>>
>> That will be a bit more complex and I think can still get the problem solved
>> without adding a new method. What do you think? If you are OK with that, I
>> will send out a new patch later this week.
> If synchronize_rcu_expedited() won't do, let's go with the original patch.
> The operation does make general sense in that it's for a distinctive step in
> the destruction process although I'm a bit curious why it's called before
> DYING is set.

Again, we have to synchronize between the css_is_dying() call in 
is_cpuset_online() which is used by cpuset_for_each_child() against the 
calling of cpuset_css_killed(). Since setting of the CSS_DYING flag is 
protected by cgroup_mutex() while most of the cpuset code is protected 
by cpuset_mutex. The two operations can be asynchronous with each other. 
So I have to make sure that by the time CSS_DYING is set, the 
cpuset_css_killed() call has been invoked. I need to do similar check if 
we decide to use synchronize_rcu*() to wait for the completion of 
cpuset_css_offline() call.

As I am also dealing with a lot of locking related issues, I am more 
attuned to this kind of racing conditions to make sure nothing bad will 
happen.

Cheers,
Longman



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ