linux-kernel - Re: [regression] cpuset: offlined CPUs removed from affinity masks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <266054305.17171.1585597982690.JavaMail.zimbra@efficios.com>
Date:   Mon, 30 Mar 2020 15:53:02 -0400 (EDT)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     Li Zefan <lizefan@...wei.com>, cgroups <cgroups@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [regression] cpuset: offlined CPUs removed from affinity masks

----- On Mar 24, 2020, at 3:30 PM, Mathieu Desnoyers mathieu.desnoyers@...icios.com wrote:

> ----- On Mar 24, 2020, at 2:01 PM, Tejun Heo tj@...nel.org wrote:
> 
>> On Thu, Mar 12, 2020 at 03:47:50PM -0400, Mathieu Desnoyers wrote:
>>> The basic idea is to allow applications to pin to every possible cpu, but
>>> not allow them to use this to consume a lot of cpu time on CPUs they
>>> are not allowed to run.
>>> 
>>> Thoughts ?
>> 
>> One thing that we learned is that priority alone isn't enough in isolating cpu
>> consumptions no matter how low the priority may be if the workload is latency
>> sensitive. The actual computation capacity of cpus gets saturated way before cpu
>> time is saturated and latency impact from lowered mips becomes noticeable. So,
>> depending on workloads, allowing threads to run at the lowest priority on
>> disallowed cpus might not lead to behaviors that users expect but I have no idea
>> what kind of usage models you have on mind for the new system call.
> 
[...]

One possibility would be to use SCHED_IDLE scheduling class rather than SCHED_OTHER
with nice +19. The unfortunate side-effect AFAIU shows up when a thread requests to
be pinned on a CPU which is continuously overcommitted. It may never run. This could
come as a surprise for the user. The only case where this would happen is if:

- A thread is pinned on CPU N, and
  - CPU N is not part of the allowed mask for the task's cpuset (and is overcommitted), or
  - CPU N is offline, and the fallback CPU is not part of the allowed mask for the
    task's cpuset (and is overcommitted).

Is it an acceptable behavior ? How is userspace supposed to detect this kind of situation
and mitigate it ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com