[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <781c0d8e-7cb6-4f3e-913a-b2a6b0bfed5e@redhat.com>
Date: Fri, 30 Jan 2026 20:45:52 -0500
From: Waiman Long <llong@...hat.com>
To: Chen Ridong <chenridong@...weicloud.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Shuah Khan <shuah@...nel.org>
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH/for-next v2 1/2] cgroup/cpuset: Defer
housekeeping_update() call from CPU hotplug to workqueue
On 1/30/26 7:58 PM, Chen Ridong wrote:
>
> On 2026/1/30 23:42, Waiman Long wrote:
>> The update_isolation_cpumasks() function can be called either directly
>> from regular cpuset control file write with cpuset_full_lock() called
>> or via the CPU hotplug path with cpus_write_lock and cpuset_mutex held.
>>
>> As we are going to enable dynamic update to the nozh_full housekeeping
>> cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
>> allowing the CPU hotplug path to call into housekeeping_update() directly
>> from update_isolation_cpumasks() will likely cause deadlock. So we
>> have to defer any call to housekeeping_update() after the CPU hotplug
>> operation has finished. This is now done via the workqueue where
>> the actual housekeeping_update() call, if needed, will happen after
>> cpus_write_lock is released.
>>
>> We can't use the synchronous task_work API as call from CPU hotplug
>> path happen in the per-cpu kthread of the CPU that is being shut down
>> or brought up. Because of the asynchronous nature of workqueue, the
>> HK_TYPE_DOMAIN housekeeping cpumask will be updated a bit later than the
>> "cpuset.cpus.isolated" control file in this case.
>>
>> Also add a check in test_cpuset_prs.sh and modify some existing
>> test cases to confirm that "cpuset.cpus.isolated" and HK_TYPE_DOMAIN
>> housekeeping cpumask will both be updated.
>>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>> kernel/cgroup/cpuset.c | 37 +++++++++++++++++--
>> .../selftests/cgroup/test_cpuset_prs.sh | 13 +++++--
>> 2 files changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 7b7d12ab1006..0b0eb1df09d5 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -84,6 +84,9 @@ static cpumask_var_t isolated_cpus;
>> */
>> static bool isolated_cpus_updating;
>>
>> +/* Both cpuset_mutex and cpus_read_locked acquired */
>> +static bool cpuset_locked;
>> +
>> /*
>> * A flag to force sched domain rebuild at the end of an operation.
>> * It can be set in
>> @@ -285,10 +288,12 @@ void cpuset_full_lock(void)
>> {
>> cpus_read_lock();
>> mutex_lock(&cpuset_mutex);
>> + cpuset_locked = true;
>> }
>>
>> void cpuset_full_unlock(void)
>> {
>> + cpuset_locked = false;
>> mutex_unlock(&cpuset_mutex);
>> cpus_read_unlock();
>> }
>> @@ -1285,6 +1290,16 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>> return false;
>> }
>>
>> +static void isolcpus_workfn(struct work_struct *work)
>> +{
>> + cpuset_full_lock();
>> + if (isolated_cpus_updating) {
>> + WARN_ON_ONCE(housekeeping_update(isolated_cpus) < 0);
>> + isolated_cpus_updating = false;
>> + }
>> + cpuset_full_unlock();
>> +}
>> +
>> /*
>> * update_isolation_cpumasks - Update external isolation related CPU masks
>> *
>> @@ -1293,14 +1308,30 @@ static bool prstate_housekeeping_conflict(int prstate, struct cpumask *new_cpus)
>> */
>> static void update_isolation_cpumasks(void)
>> {
>> - int ret;
>> + static DECLARE_WORK(isolcpus_work, isolcpus_workfn);
>>
>> if (!isolated_cpus_updating)
>> return;
>>
>> - ret = housekeeping_update(isolated_cpus);
>> - WARN_ON_ONCE(ret < 0);
>> + /*
>> + * This function can be reached either directly from regular cpuset
>> + * control file write (cpuset_locked) or via hotplug (cpus_write_lock
>> + * && cpuset_mutex held). In the later case, we defer the
>> + * housekeeping_update() call to the system_unbound_wq to avoid the
>> + * possibility of deadlock. This also means that there will be a short
>> + * period of time where HK_TYPE_DOMAIN housekeeping cpumask will lag
>> + * behind isolated_cpus.
>> + */
>> + if (!cpuset_locked) {
> Adding a global variable makes this difficult to handle, especially in
> concurrent scenarios, since we could read it outside of a critical region.
No, cpuset_locked is always read from or written into inside a critical
section. It is under cpuset_mutex up to this point and then with the
cpuset_top_mutex with the next patch.
>
> I suggest removing cpuset_locked and adding async_update_isolation_cpumasks
> instead, which can indicate to the caller it should call without holding the
> full lock.
The point of this global variable is to distinguish between calling from
CPU hotplug and the other regular cpuset code paths. The only difference
between these two are having cpus_read_lock or cpus_write_lock held.
That is why I think adding a global variable in cpuset_full_lock() is
the easy way. Otherwise, we will to add extra argument to some of the
functions to distinguish these two cases.
Cheers,
Longman
Powered by blists - more mailing lists