[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6c110e8-8170-4bf6-a845-57253b20275b@huaweicloud.com>
Date: Sat, 31 Jan 2026 09:43:07 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Waiman Long <llong@...hat.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Shuah Khan <shuah@...nel.org>
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer
housekeeping_update() call from CPU hotplug to workqueue
On 2026/1/31 9:06, Waiman Long wrote:
>
> On 1/30/26 7:47 PM, Chen Ridong wrote:
>>
>> On 2026/1/30 23:42, Waiman Long wrote:
>>> The update_isolation_cpumasks() function can be called either directly
>>> from regular cpuset control file write with cpuset_full_lock() called
>>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
> Note this statement.
Thank you for reminder.
>>>
>>> As we are going to enable dynamic update to the nozh_full housekeeping
>>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>>> allowing the CPU hotplug path to call into housekeeping_update() directly
>>> from update_isolation_cpumasks() will likely cause deadlock. So we
>>> have to defer any call to housekeeping_update() after the CPU hotplug
>>> operation has finished. This is now done via the workqueue where
>>> the actual housekeeping_update() call, if needed, will happen after
>>> cpus_write_lock is released.
>>>
>>> We can't use the synchronous task_work API as call from CPU hotplug
>>> path happen in the per-cpu kthread of the CPU that is being shut down
>>> or brought up. Because of the asynchronous nature of workqueue, the
>>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>>> "cpuset.cpus.isolated" control file in this case.
>>>
>>> Also add a check in test_cpuset_prs.sh and modify some existing
>>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>>> housekeeping cpumask will both be updated.
>>>
>>> Signed-off-by: Waiman Long <longman@...hat.com>
>>> ---
>>> kernel/cgroup/cpuset.c | 37 +++++++++++++++++--
>>> .../selftests/cgroup/test_cpuset_prs.sh | 13 +++++--
>>> 2 files changed, 44 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 7b7d12ab1006..0b0eb1df09d5 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -84,6 +84,9 @@ static cpumask_var_t isolated_cpus;
>>> */
>>> static bool isolated_cpus_updating;
>>> +/* Both cpuset_mutex and cpus_read_locked acquired */
>>> +static bool cpuset_locked;
>>> +
>>> /*
>>> * A flag to force sched domain rebuild at the end of an operation.
>>> * It can be set in
>>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>>> {
>>> cpus_read_lock();
>>> mutex_lock(&cpuset_mutex);
>>> + cpuset_locked = true;
>>> }
>>> void cpuset_full_unlock(void)
>>> {
>>> + cpuset_locked = false;
>>> mutex_unlock(&cpuset_mutex);
>>> cpus_read_unlock();
>>> }
>>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate,
>>> struct cpumask *new_cpus)
>>> return false;
>>> }
>>> +static void isolcpus_workfn(struct work_struct *work)
>>> +{
>>> + cpuset_full_lock();
>>> + if (isolated_cpus_updating) {
>>> + WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>>> + isolated_cpus_updating = false;
>>> + }
>>> + cpuset_full_unlock();
>>> +}
>>> +
>>> /*
>>> * update_isolation_cpumasks - Update external isolation related CPU masks
>>> *
>>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int
>>> prstate, struct cpumask *new_cpus)
>>> */
>>> static void update_isolation_cpumasks(void)
>>> {
>>> - int ret;
>>> + static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>> if (!isolated_cpus_updating)
>>> return;
>>>
>> Can this happen?
>>
>> cpu0 cpu1
>> [...]
>>
>> isolated_cpus_updating = true;
>> ...
>> // 'full_lock' is not acquired
>> update_isolation_cpumasks
> That is not true. Either cpus_read_lock or cpus_write_lock and cpuset_mutex are
> held when update_isolation_cpumasks() is called. So there is mutual exclusion.
Eh, we currently assume that it can only be called from existing scenarios, so
it's okay for now. But I'm concerned that if we later use
update_isolation_cpumasks without realizing that we need to hold either
cpus_write_lock or (cpus_read_lock && cpuset_mutex) , we could run into
concurrency issues. Maybe I'm worrying too much.
And maybe we shuold add 'lockdep_assert_held' inside the update_isolation_cpumasks.
>> // exec worker concurrently
>> isolcpus_workfn
>> cpuset_full_lock
>> isolated_cpus_updating = false;
>> cpuset_full_unlock();
>> // This returns uncorrectly
>> if (!isolated_cpus_updating)
>> return;
>>
> Cheers,
> Longman
>
--
Best regards,
Ridong
Powered by blists - more mailing lists