[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2bb8c031-03a9-ef93-1505-6e7fbcc6d847@redhat.com>
Date: Tue, 17 Jan 2023 13:13:31 -0500
From: Waiman Long <longman@...hat.com>
To: Will Deacon <will@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Lai Jiangshan <jiangshanlai@...il.com>, qperret@...gle.com
Subject: Re: [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided
cpumask in sched_setaffinity()
On 1/17/23 11:08, Will Deacon wrote:
> Hi Waiman,
>
> On Thu, Sep 22, 2022 at 02:00:38PM -0400, Waiman Long wrote:
>> The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched:
>> Introduce task_struct::user_cpus_ptr to track requested affinity"). It
>> is currently used only by arm64 arch due to possible asymmetric CPU
>> setup. This patch extends its usage to save user provided cpumask
>> when sched_setaffinity() is called for all arches. With this patch
>> applied, user_cpus_ptr, once allocated after a successful call to
>> sched_setaffinity(), will only be freed when the task exits.
>>
>> Since user_cpus_ptr is supposed to be used for "requested
>> affinity", there is actually no point to save current cpu affinity in
>> restrict_cpus_allowed_ptr() if sched_setaffinity() has never been called.
>> Modify the logic to set user_cpus_ptr only in sched_setaffinity() and use
>> it in restrict_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
>> if defined but not changing it.
>>
>> This will be some changes in behavior for arm64 systems with asymmetric
>> CPUs in some corner cases. For instance, if sched_setaffinity()
>> has never been called and there is a cpuset change before
>> relax_compatible_cpus_allowed_ptr() is called, its subsequent call will
>> follow what the cpuset allows but not what the previous cpu affinity
>> setting allows.
>>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>> kernel/sched/core.c | 82 ++++++++++++++++++++------------------------
>> kernel/sched/sched.h | 7 ++++
>> 2 files changed, 44 insertions(+), 45 deletions(-)
> We've tracked this down as the cause of an arm64 regression in Android and I've
> reproduced the issue with mainline.
>
> Basically, if an arm64 system is booted with "allow_mismatched_32bit_el0" on
> the command-line, then the arch code will (amongst other things) call
> force_compatible_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
> when exec()'ing a 32-bit or a 64-bit task respectively.
IOW, relax_compatible_cpus_allowed_ptr() can be called without a
previous force_compatible_cpus_allowed_ptr(). Right?
A possible optimization in this case is to add a bit flag in the
task_struct to indicate a previous call to
force_compatible_cpus_allowed_ptr(). Without that flag set,
relax_compatible_cpus_allowed_ptr() can return immediately.
>
> If you consider a system where everything is 64-bit but the cmdline option
> above is present, then the call to relax_compatible_cpus_allowed_ptr() isn't
> expected to do anything in this case, and the old code made sure of that:
>
>> @@ -3055,30 +3032,21 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
>>
>> /*
>> * Restore the affinity of a task @p which was previously restricted by a
>> - * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
>> - * @p->user_cpus_ptr.
>> + * call to force_compatible_cpus_allowed_ptr().
>> *
>> * It is the caller's responsibility to serialise this with any calls to
>> * force_compatible_cpus_allowed_ptr(@p).
>> */
>> void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
>> {
>> - struct cpumask *user_mask = p->user_cpus_ptr;
>> - unsigned long flags;
>> + int ret;
>>
>> /*
>> - * Try to restore the old affinity mask. If this fails, then
>> - * we free the mask explicitly to avoid it being inherited across
>> - * a subsequent fork().
>> + * Try to restore the old affinity mask with __sched_setaffinity().
>> + * Cpuset masking will be done there too.
>> */
>> - if (!user_mask || !__sched_setaffinity(p, user_mask))
>> - return;
> ... since it returned early here if '!user_mask' ...
The flag bit will work like the user_mask check here.
>
>> -
>> - raw_spin_lock_irqsave(&p->pi_lock, flags);
>> - user_mask = clear_user_cpus_ptr(p);
>> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
>> -
>> - kfree(user_mask);
>> + ret = __sched_setaffinity(p, task_user_cpus(p));
>> + WARN_ON_ONCE(ret);
> ... however, now we end up going down into __sched_setaffinity() with
> task_user_cpus(p) giving us the 'cpu_possible_mask'! This can lead to a mixture
> of WARN_ON()s and incorrect affinity masks (for example, a newly exec'd task
> ends up with the affinity mask of the online CPUs at the point of exec() and is
> unable to run on anything onlined later).
CPU hotplug should update the cpumask of existing running application as
allowed by cpuset.
>
> I've had a crack at fixing the code above to restore the old behaviour, and it
> seems to work for my basic tests (still pending confirmation from others):
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bb1ee6d7bdde..0d4a11384648 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3125,17 +3125,16 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx);
> void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
> {
> struct affinity_context ac = {
> - .new_mask = task_user_cpus(p),
> + .new_mask = p->user_cpus_ptr,
> .flags = 0,
> };
> - int ret;
>
> /*
> * Try to restore the old affinity mask with __sched_setaffinity().
> * Cpuset masking will be done there too.
> */
> - ret = __sched_setaffinity(p, &ac);
> - WARN_ON_ONCE(ret);
> + if (ac.new_mask)
> + WARN_ON_ONCE(__sched_setaffinity(p, &ac));
> }
>
> void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>
>
> With this change, task_user_cpus() is only used by restrict_cpus_allowed_ptr()
> so I'd be inclined to remove it altogether tbh.
>
> What do you think?
The problem here is that force_compatible_cpus_allowed_ptr() can be
called without a matching relax_compatible_cpus_allowed_ptr() at the
end. So we may end up artificially restrict the number of cpus that can
be used when running a 64-bit binary.
What do you think about the idea of having a bit flag to track that?
Cheers,
Longman
Powered by blists - more mailing lists