[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c9d82d78-b47d-0f4a-f1ca-81df78e1baa6@redhat.com>
Date: Fri, 29 Jul 2022 14:31:00 -0400
From: Waiman Long <longman@...hat.com>
To: Valentin Schneider <vschneid@...hat.com>, Tejun Heo <tj@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Will Deacon <will@...nel.org>
Subject: Re: [PATCH 1/2] cgroup/cpuset: Keep current cpus list if cpus
affinity was explicitly set
On 7/29/22 10:50, Waiman Long wrote:
> On 7/29/22 10:15, Valentin Schneider wrote:
>> On 28/07/22 11:39, Tejun Heo wrote:
>>> Hello, Waiman.
>>>
>>> On Thu, Jul 28, 2022 at 05:04:19PM -0400, Waiman Long wrote:
>>>>> So, the patch you proposed is making the code remember one special
>>>>> aspect of
>>>>> user requested configuration - whether it configured it or not,
>>>>> and trying
>>>>> to preserve that particular state as cpuset state changes. It
>>>>> addresses the
>>>>> immediate problem but it is a very partial approach. Let's say a
>>>>> task wanna
>>>>> be affined to one logical thread of each core and set its mask to
>>>>> 0x5555.
>>>>> Now, let's say cpuset got enabled and enforced 0xff and affined
>>>>> the task to
>>>>> 0xff. After a while, the cgroup got more cpus allocated and its
>>>>> cpuset now
>>>>> has 0xfff. Ideally, what should happen is the task now having the
>>>>> effective
>>>>> mask of 0x555. In practice, tho, it either would get 0xf55 or 0x55
>>>>> depending
>>>>> on which way we decide to misbehave.
>>>> OK, I see what you want to accomplish. To fully address this issue,
>>>> we will
>>>> need to have a new cpumask variable in the the task structure which
>>>> will be
>>>> allocated if sched_setaffinity() is ever called. I can rework my
>>>> patch to
>>>> use this approach.
>>> Yeah, we'd need to track what user requested separately from the
>>> currently
>>> effective cpumask. Let's make sure that the scheduler folks are on
>>> board
>>> before committing to the idea tho. Peter, Ingo, what do you guys think?
>>>
>> FWIW on a runtime overhead side of things I think it'll be OK as that
>> should be just an extra mask copy in sched_setaffinity() and a subset
>> check / cpumask_and() in set_cpus_allowed_ptr(). The policy side is a
>> bit
>> less clear (when, if ever, do we clear the user-defined mask? Will it
>> keep
>> haunting us even after moving a task to a disjoint cpuset partition?).
>
> The runtime overhead should be minimal. It is the behavioral side that
> we should be careful about. It is a change in existing behavior and we
> don't want to cause surprise to the users. Currently, a task that set
> its cpu affinity explicitly will have its affinity reset whenever
> there is any change to the cpuset it belongs to or a hotplug event
> touch any cpu in the current cpuset. The new behavior we are proposing
> here is that it will try its best to keep the cpu affinity that the
> user requested within the constraint of the current cpuset as well as
> the cpu hotplug state.
>
>
>>
>> There's also if/how that new mask should be exposed, because attaching a
>> task to a cpuset will now yield a not-necessarily-obvious affinity -
>> e.g. in the thread affinity example above, if the initial affinity
>> setting
>> was done ages ago by some system tool, IMO the user needs a way to be
>> able
>> to expect/understand the result of 0x555 rather than 0xfff.
>
> Users can use sched_getaffinity(2) to retrieve the current cpu
> affinity. It is up to users to set another one if they don't like the
> current one. I don't think we need to return what the previous
> requested cpu affinity is. They are suppose to know that or they can
> set their own if they don't like it. \
Looking at Will's series that introduced user_cpus_ptr, I think we can
overlay our proposal on top of that. So calling sched_setaffinity() will
also update user_cpus_ptr. We may still need a flag to indicate whether
user_cpus_ptr is set up because of sched_setaffinity() or due to a call
to force_compatible_cpus_allowed_ptr() from arm64 arch code. That will
make our work easier as some of the infrastructure is already there. I
am looking forward for your feedback.
Thanks,
Longman
Powered by blists - more mailing lists