[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220728144420.GA27407@blackbody.suse.cz>
Date: Thu, 28 Jul 2022 16:44:20 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Waiman Long <longman@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] cgroup/cpuset: Keep current cpus list if cpus
affinity was explicitly set
Hello.
On Wed, Jul 27, 2022 at 08:58:14PM -0400, Waiman Long <longman@...hat.com> wrote:
> It was found that any change to the current cpuset hierarchy may reset
> the cpus_allowed list of the tasks in the affected cpusets to the
> default cpuset value even if those tasks have cpus affinity explicitly
> set by the users before.
I'm surprised this went so long unnoticed / unreported.
Could it be users relied on that implicit affinity reset?
> That is especially easy to trigger under a cgroup v2 environment where
> writing "+cpuset" to the root cgroup's cgroup.subtree_control file
> will reset the cpus affinity of all the processes in the system.
This should apply only to tasks that were extracted out of the root
cgroup, no? (OK, those are all processes practically.)
(Even without your second patch, the scope should be limited because of
src_cset==dst_cset check in cgroup_migrate_prepare_dst().)
> That is especially problematic in a nohz_full environment where the
> tasks running in the nohz_full CPUs usually have their cpus affinity
> explicitly set and will behave incorrectly if cpus affinity changes.
One could also argue that for such processes, cgroup hierarchy should be
first configured and only then they start and set own affinity.
> Fix this problem by adding a flag in the task structure to indicate that
> a task has their cpus affinity explicitly set before and make cpuset
> code not to change their cpus_allowed list unless the user chosen cpu
> list is no longer a subset of the cpus_allowed list of the cpuset itself.
I'm uneasy with the occasional revert of this flag, i.e. the task who
set their affinity would sometimes have it overwritten and sometimes
not (which might have been relied on, especially with writes into
cpuset.cpus).
(But I have no better answer than the counter-argument above since
there's no easier way to detect the implicit migrations.)
Also, is there similar effect with memory binding?
Thanks,
Michal
Powered by blists - more mailing lists