lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 1 Feb 2023 13:46:11 -0500
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org,
        kernel-team@...roid.com, Zefan Li <lizefan.x@...edance.com>,
        Tejun Heo <tj@...nel.org>,
        Johannes Weiner <hannes@...xchg.org>, cgroups@...r.kernel.org
Subject: Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter
 offline CPUs

On 2/1/23 10:16, Waiman Long wrote:
> On 2/1/23 04:14, Peter Zijlstra wrote:
>> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote:
>>> On 1/31/23 17:17, Will Deacon wrote:
>>>> From: Peter Zijlstra <peterz@...radead.org>
>>>>
>>>> There is a difference in behaviour between CPUSET={y,n} that is now
>>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr().
>>>>
>>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the
>>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is
>>>> calling __sched_setaffinity() unconditionally.
>>>>
>>>> But the underlying problem goes back a lot further, possibly to
>>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which
>>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to
>>>> cs->effective_cpus.
>>>>
>>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out
>>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is
>>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate
>>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root
>>>> cpuset this does not happen and they will forever after be excluded
>>>> from CPUs onlined later.
>>>>
>>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask,
>>>> including the offline CPUs.
>>>>
>>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested 
>>>> cpumask")
>>>> Reported-by: Will Deacon <will@...nel.org>
>>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>>>> Link: 
>>>> https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck
>>>> Signed-off-by: Will Deacon <will@...nel.org>
>>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only
>>> tracked online cpus and ignored the offline ones. It behaves more like
>>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - 
>>> cpus_allowed and
>>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and 
>>> effective_cpus
>>> are effectively the same and track online cpus. With cpuset v2, 
>>> cpus_allowed
>>> contains what the user has written into and it won't be changed until
>>> another write happen. However, what the user written may not be what 
>>> the
>>> system can give it and effective_cpus is what the system decides a 
>>> cpuset
>>> can use.
>>>
>>> Cpuset v2 is able to handle hotplug correctly and update the task's 
>>> cpumask
>>> accordingly. So missing previously offline cpus won't happen with v2.
>>>
>>> Since v1 keeps the old behavior, previously offlined cpus are lost 
>>> in the
>>> cpuset's cpus_allowed. However tasks in the root cpuset will still 
>>> be fine
>>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. 
>>> IOW, only
>>> tasks in a non-root cpuset suffer this problem.
>>>
>>> It was a known issue in v1 and I believe is one of the major reasons 
>>> of the
>>> cpuset v2 redesign.
>>>
>>> A major concern I have is the overhead of creating a poor man 
>>> version of v2
>>> cpus_allowed. This issue can be worked around even for cpuset v1 if 
>>> it is
>>> mounted with the cpuset_v2_mode option to behave more like v2 in its 
>>> cpumask
>>> handling. Alternatively we may be able to provide a config option to 
>>> make
>>> this the default for v1 without the special mount option, if necessary.
>> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT*
>> mask offline cpus for root cgroup tasks, ever. (And the only reason it
>> gets away with masking offline for !root is that it re-applies the mask
>> every time it changes.)
>>
>> Yes it did that for a fair while -- but it is wrong and broken and a
>> very big behavioural difference between CONFIG_CPUSET={y,n}. This must
>> not be.
>>
>> Arguably cpuset-v2 is still wrong for masking offline cpus in it's
>> effective_cpus mask, but I really didn't want to go rewrite cpuset.c for
>> something that needs to go into /urgent *now*.
>>
>> Hence this minimal patch that at least lets sched_setaffinity() work as
>> intended.
>
> I don't object to the general idea of keeping offline cpus in a task's 
> cpu affinity. In the case of cpu offline event, we can skip removing 
> that offline cpu from the task's cpu affinity. That will partially 
> solve the problem here and is also simpler.
>
> I believe a main reason why effective_cpus holds only online cpus is 
> because of the need to detect when there is no online cpus available 
> in a given cpuset. In this case, it will fall back to the nearest 
> ancestors with online cpus.
>
> This offline cpu problem with cpuset v1 is a known problem for a long 
> time. It is not a recent regression.

Note that using cpus_allowed directly in cgroup v2 may not be right 
because cpus_allowed may have no relationship to effective_cpus at all 
in some cases, e.g.

    root
     |
     V
     A (cpus_allowed = 1-4, effective_cpus = 1-4)
     |
     V
     B (cpus_allowed = 5-8, effective_cpus = 1-4)

In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong.

I wonder how often is cpu hotplug happening in those arm64 cpu systems 
that only have a subset of cpus that can run 32-bit programs.

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ