[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6fd5e9c-d302-43db-ac89-7b09ab0770be@redhat.com>
Date: Mon, 3 Nov 2025 23:10:40 -0500
From: Waiman Long <llong@...hat.com>
To: Chen Ridong <chenridong@...weicloud.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
Chen Ridong <chenridong@...wei.com>, Gabriele Monaco <gmonaco@...hat.com>,
Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [cgroup/for-6.19 PATCH v2 2/3] cgroup/cpuset: Fail if isolated
and nohz_full don't leave any housekeeping
On 11/3/25 9:19 PM, Chen Ridong wrote:
>
> On 2025/11/4 9:30, Waiman Long wrote:
>> Currently the user can set up isolated cpus via cpuset and nohz_full in
>> such a way that leaves no housekeeping CPU (i.e. no CPU that is neither
>> domain isolated nor nohz full). This can be a problem for other
>> subsystems (e.g. the timer wheel imgration).
>>
>> Prevent this configuration by blocking any assignation that would cause
>> the union of domain isolated cpus and nohz_full to covers all CPUs.
>>
>> [longman: Remove isolated_cpus_should_update() and rewrite the checking
>> in update_prstate() and update_parent_effective_cpumask(), also add
>> prstate_housekeeping_conflict() check in update_prstate() as
>> suggested by Chen Ridong]
>>
>> Originally-by: Gabriele Monaco <gmonaco@...hat.com>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>> kernel/cgroup/cpuset.c | 75 +++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 74 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index da770dac955e..0c49905df394 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1393,6 +1393,45 @@ static bool partition_xcpus_del(int old_prs, struct cpuset *parent,
>> return isolcpus_updated;
>> }
>>
>> +/*
>> + * isolated_cpus_can_update - check for isolated & nohz_full conflicts
>> + * @add_cpus: cpu mask for cpus that are going to be isolated
>> + * @del_cpus: cpu mask for cpus that are no longer isolated, can be NULL
>> + * Return: false if there is conflict, true otherwise
>> + *
>> + * If nohz_full is enabled and we have isolated CPUs, their combination must
>> + * still leave housekeeping CPUs.
>> + *
>> + * TBD: Should consider merging this function into
>> + * prstate_housekeeping_conflict().
>> + */
>> +static bool isolated_cpus_can_update(struct cpumask *add_cpus,
>> + struct cpumask *del_cpus)
>> +{
>> + cpumask_var_t full_hk_cpus;
>> + int res = true;
>> +
>> + if (!housekeeping_enabled(HK_TYPE_KERNEL_NOISE))
>> + return true;
>> +
>> + if (del_cpus && cpumask_weight_and(del_cpus,
>> + housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
>> + return true;
>> +
>> + if (!alloc_cpumask_var(&full_hk_cpus, GFP_KERNEL))
>> + return false;
>> +
>> + cpumask_and(full_hk_cpus, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
>> + housekeeping_cpumask(HK_TYPE_DOMAIN));
>> + cpumask_andnot(full_hk_cpus, full_hk_cpus, isolated_cpus);
>> + cpumask_and(full_hk_cpus, full_hk_cpus, cpu_active_mask);
>> + if (!cpumask_weight_andnot(full_hk_cpus, add_cpus))
>> + res = false;
>> +
>> + free_cpumask_var(full_hk_cpus);
>> + return res;
>> +}
>> +
>> static void update_isolation_cpumasks(bool isolcpus_updated)
>> {
>> int ret;
>> @@ -1551,6 +1590,9 @@ static int remote_partition_enable(struct cpuset *cs, int new_prs,
>> if (!cpumask_intersects(tmp->new_cpus, cpu_active_mask) ||
>> cpumask_subset(top_cpuset.effective_cpus, tmp->new_cpus))
>> return PERR_INVCPUS;
>> + if ((new_prs == PRS_ISOLATED) &&
>> + !isolated_cpus_can_update(tmp->new_cpus, NULL))
>> + return PERR_HKEEPING;
>>
> Do we also need to check prstate_housekeeping_conflict here?
Right. I missed that. Will add that in the next version.
>
>> spin_lock_irq(&callback_lock);
>> isolcpus_updated = partition_xcpus_add(new_prs, NULL, tmp->new_cpus);
>> @@ -1650,6 +1692,9 @@ static void remote_cpus_update(struct cpuset *cs, struct cpumask *xcpus,
>> else if (cpumask_intersects(tmp->addmask, subpartitions_cpus) ||
>> cpumask_subset(top_cpuset.effective_cpus, tmp->addmask))
>> cs->prs_err = PERR_NOCPUS;
>> + else if ((prs == PRS_ISOLATED) &&
>> + !isolated_cpus_can_update(tmp->addmask, tmp->delmask))
>> + cs->prs_err = PERR_HKEEPING;
>> if (cs->prs_err)
>> goto invalidate;
>> }
> Ditto.
prstate_housekeeping_conflict() has been called earlier via
validate_partition() from partition_cpus_change(). We don't need one
more check here. However, I forgot that enabling a partition will not
call validate_partition().
>
>> @@ -1750,6 +1795,7 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>> int part_error = PERR_NONE; /* Partition error? */
>> int isolcpus_updated = 0;
>> struct cpumask *xcpus = user_xcpus(cs);
>> + int parent_prs = parent->partition_root_state;
>> bool nocpu;
>>
>> lockdep_assert_held(&cpuset_mutex);
>> @@ -1813,6 +1859,10 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
>> if (prstate_housekeeping_conflict(new_prs, xcpus))
>> return PERR_HKEEPING;
>>
>> + if ((new_prs == PRS_ISOLATED) && (new_prs != parent_prs) &&
>> + !isolated_cpus_can_update(xcpus, NULL))
>> + return PERR_HKEEPING;
>> +
>> if (tasks_nocpu_error(parent, cs, xcpus))
>> return PERR_NOCPUS;
>>
> I think isolated_cpus_can_update check should be also added to validate_partition function.
>
> If deemed necessary, you may consider applying the patch below, which reuses validate_partition to
> enable the local partition, so validate_partition can be common block.
>
> https://lore.kernel.org/cgroups/20251025064844.495525-4-chenridong@huaweicloud.com/
I do think your patch series will make that simpler. You can certainly
update your patch series to include that additional check into
validate_partition().
Cheers,
Longman
Powered by blists - more mailing lists