linux-kernel - Re: [PATCH v2] sched: Store restrict_cpus_allowed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e782060f-0c2f-b287-1eaf-3058eac6dcb4@redhat.com>
Date:   Mon, 30 Jan 2023 12:32:51 -0500
From:   Waiman Long <longman@...hat.com>
To:     Will Deacon <will@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Phil Auld <pauld@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, regressions@...ts.linux.dev,
        regressions@...mhuis.info
Subject: Re: [PATCH v2] sched: Store restrict_cpus_allowed_ptr() call state

On 1/26/23 11:11, Will Deacon wrote:
> On Tue, Jan 24, 2023 at 03:24:36PM -0500, Waiman Long wrote:
>> On 1/24/23 14:48, Will Deacon wrote:
>>> On Fri, Jan 20, 2023 at 09:17:49PM -0500, Waiman Long wrote:
>>>> The user_cpus_ptr field was originally added by commit b90ca8badbd1
>>>> ("sched: Introduce task_struct::user_cpus_ptr to track requested
>>>> affinity"). It was used only by arm64 arch due to possible asymmetric
>>>> CPU setup.
>>>>
>>>> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
>>>> cpumask"), task_struct::user_cpus_ptr is repurposed to store user
>>>> requested cpu affinity specified in the sched_setaffinity().
>>>>
>>>> This results in a performance regression in an arm64 system when booted
>>>> with "allow_mismatched_32bit_el0" on the command-line. The arch code will
>>>> (amongst other things) calls force_compatible_cpus_allowed_ptr() and
>>>> relax_compatible_cpus_allowed_ptr() when exec()'ing a 32-bit or a 64-bit
>>>> task respectively. Now a call to relax_compatible_cpus_allowed_ptr()
>>>> will always result in a __sched_setaffinity() call whether there is a
>>>> previous force_compatible_cpus_allowed_ptr() call or not.
>>> I'd argue it's more than just a performance regression -- the affinity
>>> masks are set incorrectly, which is a user visible thing
>>> (i.e. sched_getaffinity() gives unexpected values).
>> Can your elaborate a bit more on what you mean by getting unexpected
>> sched_getaffinity() results? You mean the result is wrong after a
>> relax_compatible_cpus_allowed_ptr(). Right?
> Yes, as in the original report. If, on a 4-CPU system, I do the following
> with v6.1 and "allow_mismatched_32bit_el0" on the kernel cmdline:
>
> # for c in `seq 1 3`; do echo 0 > /sys/devices/system/cpu/cpu$c/online; done
> # yes > /dev/null &
> [1] 334
> # taskset -p 334
> pid 334's current affinity mask: 1
> # for c in `seq 1 3`; do echo 1 > /sys/devices/system/cpu/cpu$c/online; done
> # taskset -p 334
> pid 334's current affinity mask: f
>
> but with v6.2-rc5 that last taskset invocation gives:
>
> pid 334's current affinity mask: 1
>
> so, yes, the performance definitely regresses, but that's because the
> affinity mask is wrong!

Are you using cgroup v1 or v2? Are your process in the root 
cgroup/cpuset or a child cgroup/cpuset under root?

If you are using cgroup v1 in a child cpuset, cpuset.cpus works more 
like cpuset.cpus_effective. IOW, with an offline and then online hotplug 
event, the cpu will be permanently lost from the cpuset. So the above is 
an expected result. If you using cgroup v2, the cpuset should be able to 
recover the lost cpu after the online event. If your process is in the 
root cpuset, the cpu will not be lost too. Alternatively, if you mount 
the v1 cpuset with the "cpuset_v2_mode" flag, it will behave more like 
v2 and recover the lost cpu.

I ran a similar cpu offline/online test with cgroup v1 and v2 and 
confirm that v1 has the above behavior and v2 is fine.

I believe what you have observed above may not be related to my sched 
patch as I can't see how it will affect cpu hotplug at all.

Cheers,
Longman