[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <nqes55hiydw37qpt5mrqwzyhan5nxlzvuoccei4hmjloccr5xr@aqkuqerfwomc>
Date: Tue, 26 Aug 2025 16:25:03 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Waiman Long <llong@...hat.com>,
Chen Ridong <chenridong@...weicloud.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH v3] sched/core: Skip user_cpus_ptr masking if no online
CPU left
Hi.
I had a look after a while (thanks for reminders Ridong).
On Mon, Jul 21, 2025 at 11:28:15AM -0400, Waiman Long <llong@...hat.com> wrote:
> This corner case as specified in Chen Ridong's patch only happens with a
> cpuset v1 environment, but it is still the case that the default cpu
> affinity of the root cgroup (with or without CONFIG_CGROUPS) will include
> offline CPUs, if present.
IIUC, the generic sched_setaffinity(2) is ready for that, simply
returning an EINVAL.
> So it still make senses to skip the sched_setaffinity() setting if
> there is no online CPU left, though it will be much harder to have
> such a condition without using cpuset v1.
That sounds like there'd be no issue without cpuset v1 and the source of
the warning has quite a telling comment:
* fail. TODO: have a better way to handle failure here
*/
WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
The trouble is that this is from cpuset_attach() (cgroup_subsys.attach)
where no errors are expected. So I'd say the place for the check should
be earlier in cpuset_can_attach() [1]. I'm not sure if that's universally
immune against cpu offlining but it'd be sufficient for the reported
sequential offlining.
HTH,
Michal
[1] Although the error propagates, it ends up without recovery in
remove_tasks_in_empty_cpuset() "only" as an error message. But that's
likely all what can be done in this workfn context -- it's better than
silently skipping the migration as consequence of this patch.
Download attachment "signature.asc" of type "application/pgp-signature" (266 bytes)
Powered by blists - more mailing lists