[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8907b39b-6d30-4b56-b358-d63f9f625993@redhat.com>
Date: Tue, 26 Aug 2025 12:06:46 -0400
From: Waiman Long <llong@...hat.com>
To: Michal Koutný <mkoutny@...e.com>,
Waiman Long <llong@...hat.com>, Chen Ridong <chenridong@...weicloud.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH v3] sched/core: Skip user_cpus_ptr masking if no online
CPU left
On 8/26/25 10:25 AM, Michal Koutný wrote:
> Hi.
>
> I had a look after a while (thanks for reminders Ridong).
>
> On Mon, Jul 21, 2025 at 11:28:15AM -0400, Waiman Long <llong@...hat.com> wrote:
>> This corner case as specified in Chen Ridong's patch only happens with a
>> cpuset v1 environment, but it is still the case that the default cpu
>> affinity of the root cgroup (with or without CONFIG_CGROUPS) will include
>> offline CPUs, if present.
> IIUC, the generic sched_setaffinity(2) is ready for that, simply
> returning an EINVAL.
The modified code will not be executed when called from
sched_setaffiity() as the SCA_USER flag will be set.
In the described scenario, sched_setaffinity() was called without
failure as the request was valid at the time.
>
>> So it still make senses to skip the sched_setaffinity() setting if
>> there is no online CPU left, though it will be much harder to have
>> such a condition without using cpuset v1.
> That sounds like there'd be no issue without cpuset v1 and the source of
> the warning has quite a telling comment:
>
> * fail. TODO: have a better way to handle failure here
> */
> WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
>
> The trouble is that this is from cpuset_attach() (cgroup_subsys.attach)
> where no errors are expected. So I'd say the place for the check should
> be earlier in cpuset_can_attach() [1]. I'm not sure if that's universally
> immune against cpu offlining but it'd be sufficient for the reported
> sequential offlining.
Cpuset1 has no concept of effective cpumask that excludes offline CPUs
unless "cpuset_v2_mode" mount option is used. So when the cpuset has no
CPU left, it will force migrate the tasks to its parent and the
__set_cpus_allowed_ptr() function will be invoked. The parent will
likely have those offline CPUs in their cpus_allowed list and
__set_cpus_allowed_ptr_locked() will be called with only the offline
CPUs causing the warning. Migrating to the top_cpuset is probably not
needed to illustrate the problem.
Cheers,
Longman
> HTH,
> Michal
>
> [1] Although the error propagates, it ends up without recovery in
> remove_tasks_in_empty_cpuset() "only" as an error message. But that's
> likely all what can be done in this workfn context -- it's better than
> silently skipping the migration as consequence of this patch.
Powered by blists - more mailing lists