linux-kernel - Re: [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided cpumask in sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2bb8c031-03a9-ef93-1505-6e7fbcc6d847@redhat.com>
Date:   Tue, 17 Jan 2023 13:13:31 -0500
From:   Waiman Long <longman@...hat.com>
To:     Will Deacon <will@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Lai Jiangshan <jiangshanlai@...il.com>, qperret@...gle.com
Subject: Re: [PATCH v10 2/5] sched: Use user_cpus_ptr for saving user provided
 cpumask in sched_setaffinity()

On 1/17/23 11:08, Will Deacon wrote:
> Hi Waiman,
>
> On Thu, Sep 22, 2022 at 02:00:38PM -0400, Waiman Long wrote:
>> The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched:
>> Introduce task_struct::user_cpus_ptr to track requested affinity"). It
>> is currently used only by arm64 arch due to possible asymmetric CPU
>> setup. This patch extends its usage to save user provided cpumask
>> when sched_setaffinity() is called for all arches. With this patch
>> applied, user_cpus_ptr, once allocated after a successful call to
>> sched_setaffinity(), will only be freed when the task exits.
>>
>> Since user_cpus_ptr is supposed to be used for "requested
>> affinity", there is actually no point to save current cpu affinity in
>> restrict_cpus_allowed_ptr() if sched_setaffinity() has never been called.
>> Modify the logic to set user_cpus_ptr only in sched_setaffinity() and use
>> it in restrict_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
>> if defined but not changing it.
>>
>> This will be some changes in behavior for arm64 systems with asymmetric
>> CPUs in some corner cases. For instance, if sched_setaffinity()
>> has never been called and there is a cpuset change before
>> relax_compatible_cpus_allowed_ptr() is called, its subsequent call will
>> follow what the cpuset allows but not what the previous cpu affinity
>> setting allows.
>>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>> ---
>>   kernel/sched/core.c  | 82 ++++++++++++++++++++------------------------
>>   kernel/sched/sched.h |  7 ++++
>>   2 files changed, 44 insertions(+), 45 deletions(-)
> We've tracked this down as the cause of an arm64 regression in Android and I've
> reproduced the issue with mainline.
>
> Basically, if an arm64 system is booted with "allow_mismatched_32bit_el0" on
> the command-line, then the arch code will (amongst other things) call
> force_compatible_cpus_allowed_ptr() and relax_compatible_cpus_allowed_ptr()
> when exec()'ing a 32-bit or a 64-bit task respectively.

IOW, relax_compatible_cpus_allowed_ptr() can be called without a 
previous force_compatible_cpus_allowed_ptr(). Right?

A possible optimization in this case is to add a bit flag in the 
task_struct to indicate a previous call to 
force_compatible_cpus_allowed_ptr(). Without that flag set, 
relax_compatible_cpus_allowed_ptr() can return immediately.

>
> If you consider a system where everything is 64-bit but the cmdline option
> above is present, then the call to relax_compatible_cpus_allowed_ptr() isn't
> expected to do anything in this case, and the old code made sure of that:
>
>> @@ -3055,30 +3032,21 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask);
>>   
>>   /*
>>    * Restore the affinity of a task @p which was previously restricted by a
>> - * call to force_compatible_cpus_allowed_ptr(). This will clear (and free)
>> - * @p->user_cpus_ptr.
>> + * call to force_compatible_cpus_allowed_ptr().
>>    *
>>    * It is the caller's responsibility to serialise this with any calls to
>>    * force_compatible_cpus_allowed_ptr(@p).
>>    */
>>   void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
>>   {
>> -	struct cpumask *user_mask = p->user_cpus_ptr;
>> -	unsigned long flags;
>> +	int ret;
>>   
>>   	/*
>> -	 * Try to restore the old affinity mask. If this fails, then
>> -	 * we free the mask explicitly to avoid it being inherited across
>> -	 * a subsequent fork().
>> +	 * Try to restore the old affinity mask with __sched_setaffinity().
>> +	 * Cpuset masking will be done there too.
>>   	 */
>> -	if (!user_mask || !__sched_setaffinity(p, user_mask))
>> -		return;
> ... since it returned early here if '!user_mask' ...
The flag bit will work like the user_mask check here.
>
>> -
>> -	raw_spin_lock_irqsave(&p->pi_lock, flags);
>> -	user_mask = clear_user_cpus_ptr(p);
>> -	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
>> -
>> -	kfree(user_mask);
>> +	ret = __sched_setaffinity(p, task_user_cpus(p));
>> +	WARN_ON_ONCE(ret);
> ... however, now we end up going down into __sched_setaffinity() with
> task_user_cpus(p) giving us the 'cpu_possible_mask'! This can lead to a mixture
> of WARN_ON()s and incorrect affinity masks (for example, a newly exec'd task
> ends up with the affinity mask of the online CPUs at the point of exec() and is
> unable to run on anything onlined later).

CPU hotplug should update the cpumask of existing running application as 
allowed by cpuset.


>
> I've had a crack at fixing the code above to restore the old behaviour, and it
> seems to work for my basic tests (still pending confirmation from others):
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bb1ee6d7bdde..0d4a11384648 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3125,17 +3125,16 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx);
>   void relax_compatible_cpus_allowed_ptr(struct task_struct *p)
>   {
>          struct affinity_context ac = {
> -               .new_mask  = task_user_cpus(p),
> +               .new_mask  = p->user_cpus_ptr,
>                  .flags     = 0,
>          };
> -       int ret;
>   
>          /*
>           * Try to restore the old affinity mask with __sched_setaffinity().
>           * Cpuset masking will be done there too.
>           */
> -       ret = __sched_setaffinity(p, &ac);
> -       WARN_ON_ONCE(ret);
> +       if (ac.new_mask)
> +               WARN_ON_ONCE(__sched_setaffinity(p, &ac));
>   }
>   
>   void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>
>
> With this change, task_user_cpus() is only used by restrict_cpus_allowed_ptr()
> so I'd be inclined to remove it altogether tbh.
>
> What do you think?

The problem here is that force_compatible_cpus_allowed_ptr() can be 
called without a matching relax_compatible_cpus_allowed_ptr() at the 
end. So we may end up artificially restrict the number of cpus that can 
be used when running a 64-bit binary.

What do you think about the idea of having a bit flag to track that?

Cheers,
Longman