lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49b9bd3b-6a50-48f5-a06c-3e530819d2c8@redhat.com>
Date: Mon, 9 Feb 2026 14:58:03 -0500
From: Waiman Long <llong@...hat.com>
To: Chen Ridong <chenridong@...weicloud.com>, Tejun Heo <tj@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Koutný
 <mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Anna-Maria Behnsen <anna-maria@...utronix.de>,
 Frederic Weisbecker <frederic@...nel.org>,
 Thomas Gleixner <tglx@...utronix.de>, Shuah Khan <shuah@...nel.org>
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-kselftest@...r.kernel.org
Subject: Re: [PATCH/for-next v4 1/4] cgroup/cpuset: Clarify exclusion rules
 for cpuset internal variables

On 2/8/26 10:41 PM, Chen Ridong wrote:
>
> On 2026/2/7 4:37, Waiman Long wrote:
>> Clarify the locking rules associated with file level internal variables
>> inside the cpuset code. There is no functional change.
>>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>>   kernel/cgroup/cpuset.c | 105 ++++++++++++++++++++++++-----------------
>>   1 file changed, 61 insertions(+), 44 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index c43efef7df71..a4c6386a594d 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -61,6 +61,58 @@ static const char * const perr_strings[] = {
>>   	[PERR_REMOTE]    = "Have remote partition underneath",
>>   };
>>   
>> +/*
>> + * CPUSET Locking Convention
>> + * -------------------------
>> + *
>> + * Below are the three global locks guarding cpuset structures in lock
>> + * acquisition order:
>> + *  - cpu_hotplug_lock (cpus_read_lock/cpus_write_lock)
>> + *  - cpuset_mutex
>> + *  - callback_lock (raw spinlock)
>> + *
>> + * A task must hold all the three locks to modify externally visible or
>> + * used fields of cpusets, though some of the internally used cpuset fields
>> + * and internal variables can be modified without holding callback_lock. If only
>> + * reliable read access of the externally used fields are needed, a task can
>> + * hold either cpuset_mutex or callback_lock which are exposed to other
>> + * external subsystems.
>> + *
>> + * If a task holds cpu_hotplug_lock and cpuset_mutex, it blocks others,
>> + * ensuring that it is the only task able to also acquire callback_lock and
>> + * be able to modify cpusets.  It can perform various checks on the cpuset
>> + * structure first, knowing nothing will change. It can also allocate memory
>> + * without holding callback_lock. While it is performing these checks, various
>> + * callback routines can briefly acquire callback_lock to query cpusets.  Once
>> + * it is ready to make the changes, it takes callback_lock, blocking everyone
>> + * else.
>> + *
>> + * Calls to the kernel memory allocator cannot be made while holding
>> + * callback_lock which is a spinlock, as the memory allocator may sleep or
>> + * call back into cpuset code and acquire callback_lock.
>> + *
>> + * Now, the task_struct fields mems_allowed and mempolicy may be changed
>> + * by other task, we use alloc_lock in the task_struct fields to protect
>> + * them.
>> + *
>> + * The cpuset_common_seq_show() handlers only hold callback_lock across
>> + * small pieces of code, such as when reading out possibly multi-word
>> + * cpumasks and nodemasks.
>> + */
>> +
>> +static DEFINE_MUTEX(cpuset_mutex);
>> +
>> +/*
>> + * File level internal variables below follow one of the following exclusion
>> + * rules.
>> + *
>> + * RWCS: Read/write-able by holding either cpus_write_lock or both
>> + *       cpus_read_lock and cpuset_mutex.
>> + *
> Does this mean that variables can be read or written only by holding
> cpus_write_lock?
>
> I believe that to write cpuset variables, we must hold either (cpus_write_lock
> and cpuset_mutex) or (cpus_read_lock and cpuset_mutex).

The importance of the locking rule is to emphasize the condition for 
mutual exclusion. Once cpus_write_lock is held, no other task can hold 
cpus_read_lock and cpuset_mutex. I will consider holding cpuset_mutex as 
optional, though almost all the cpuset internal variables are accessed 
from the CPU hotplug side with both cpus_write_lock and cpuset_mutex 
held. The only exception is force_sd_rebuild (sd_rebuild) that can be 
set directly from the scheduling code without holding cpuset_mtuex. I 
can change it to "holding cpus_write_lock (and optionally cpuset_mutex) 
or both cpus_read_lock and cpuset_mutex" if that makes it clearer.

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ