lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6fd5e9c-d302-43db-ac89-7b09ab0770be@redhat.com>
Date: Mon, 3 Nov 2025 23:10:40 -0500
From: Waiman Long <llong@...hat.com>
To: Chen Ridong <chenridong@...weicloud.com>, Tejun Heo <tj@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Koutný
 <mkoutny@...e.com>
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
 Chen Ridong <chenridong@...wei.com>, Gabriele Monaco <gmonaco@...hat.com>,
 Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated
 and nohz_full don't leave any housekeeping

On 11/3/25 9:19 PM, Chen Ridong wrote:
>
> On 2025/11/4 9:30, Waiman Long wrote:
>> Currently the user can set up isolated cpus via cpuset and nohz_full in
>> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
>> domain isolated nor nohz full). This can be a problem for other
>> subsystems (e.g. the timer wheel imgration).
>>
>> Prevent this configuration by blocking any assignation that would cause
>> the union of domain isolated cpus and nohz_full to covers all CPUs.
>>
>> [longman: Remove isolated_cpus_should_update() and rewrite the checking
>>   in update_prstate() and update_parent_effective_cpumask(), also add
>>   prstate_housekeeping_conflict() check in update_prstate() as
>>   suggested by Chen Ridong]
>>
>> Originally-by: Gabriele Monaco <gmonaco@...hat.com>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>>   kernel/cgroup/cpuset.c | 75 +++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 74 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index da770dac955e..0c49905df394 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1393,6 +1393,45 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
>>   	return isolcpus_updated;
>>   }
>>   
>> +/*
>> + * isolated_cpus_can_update - check for isolated & nohz_full conflicts
>> + * @add_cpus: cpu mask for cpus that are going to be isolated
>> + * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
>> + * Return: false if there is conflict, true otherwise
>> + *
>> + * If nohz_full is enabled and we have isolated CPUs, their combination must
>> + * still leave housekeeping CPUs.
>> + *
>> + * TBD: Should consider merging this function into
>> + *      prstate_housekeeping_conflict().
>> + */
>> +static bool isolated_cpus_can_update(struct cpumask *add_cpus,
>> +				     struct cpumask *del_cpus)
>> +{
>> +	cpumask_var_t full_hk_cpus;
>> +	int res = true;
>> +
>> +	if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
>> +		return true;
>> +
>> +	if (del_cpus && cpumask_weight_and(del_cpus,
>> +			housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
>> +		return true;
>> +
>> +	if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
>> +		return false;
>> +
>> +	cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
>> +		    housekeeping_cpumask(HK_TYPE_DOMAIN));
>> +	cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
>> +	cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
>> +	if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
>> +		res = false;
>> +
>> +	free_cpumask_var(full_hk_cpus);
>> +	return res;
>> +}
>> +
>>   static void update_isolation_cpumasks(bool isolcpus_updated)
>>   {
>>   	int ret;
>> @@ -1551,6 +1590,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
>>   	if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
>>   	    cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
>>   		return PERR_INVCPUS;
>> +	if ((new_prs == PRS_ISOLATED) &&
>> +	    !isolated_cpus_can_update(tmp->new_cpus, NULL))
>> +		return PERR_HKEEPING;
>>   
> Do we also need to check prstate_housekeeping_conflict here?
Right. I missed that. Will add that in the next version.
>
>>   	spin_lock_irq(&callback_lock);
>>   	isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
>> @@ -1650,6 +1692,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>>   		else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
>>   			 cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
>>   			cs->prs_err = PERR_NOCPUS;
>> +		else if ((prs == PRS_ISOLATED) &&
>> +			 !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
>> +			cs->prs_err = PERR_HKEEPING;
>>   		if (cs->prs_err)
>>   			goto invalidate;
>>   	}
> Ditto.

prstate_housekeeping_conflict() has been called earlier via 
validate_partition() from partition_cpus_change(). We don't need one 
more check here. However, I forgot that enabling a partition will not 
call validate_partition().


>
>> @@ -1750,6 +1795,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>>   	int part_error = PERR_NONE;	/* Partition error? */
>>   	int isolcpus_updated = 0;
>>   	struct cpumask *xcpus = user_xcpus(cs);
>> +	int parent_prs = parent->partition_root_state;
>>   	bool nocpu;
>>   
>>   	lockdep_assert_held(&cpuset_mutex);
>> @@ -1813,6 +1859,10 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>>   		if (prstate_housekeeping_conflict(new_prs, xcpus))
>>   			return PERR_HKEEPING;
>>   
>> +		if ((new_prs == PRS_ISOLATED) && (new_prs != parent_prs) &&
>> +		    !isolated_cpus_can_update(xcpus, NULL))
>> +			return PERR_HKEEPING;
>> +
>>   		if (tasks_nocpu_error(parent, cs, xcpus))
>>   			return PERR_NOCPUS;
>>   
> I think isolated_cpus_can_update check should be also added to validate_partition function.
>
> If deemed necessary, you may consider applying the patch below, which reuses validate_partition to
> enable the local partition, so validate_partition can be common block.
>
> https://lore.kernel.org/cgroups/20251025064844.495525-4-chenridong@huaweicloud.com/

I do think your patch series will make that simpler. You can certainly 
update your patch series to include that additional check into 
validate_partition().

Cheers,
Longman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ