linux-kernel - Re: [RFC PATCH 00/16] Core scheduling v6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c4556033-4d78-0419-0114-a17f68456ec8@amazon.com>
Date:   Thu, 27 Aug 2020 02:30:39 +0200
From:   Alexander Graf <graf@...zon.com>
To:     Vineeth Remanan Pillai <vpillai@...italocean.com>,
        Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Tim Chen" <tim.c.chen@...ux.intel.com>, <mingo@...nel.org>,
        <tglx@...utronix.de>, <pjt@...gle.com>,
        <torvalds@...ux-foundation.org>
CC:     <linux-kernel@...r.kernel.org>, <subhra.mazumdar@...cle.com>,
        <fweisbec@...il.com>, <keescook@...omium.org>,
        <kerrnel@...gle.com>, "Phil Auld" <pauld@...hat.com>,
        Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Joel Fernandes <joelaf@...gle.com>, <joel@...lfernandes.org>,
        <vineethrp@...il.com>, Chen Yu <yu.c.chen@...el.com>,
        Christian Brauner <christian.brauner@...ntu.com>
Subject: Re: [RFC PATCH 00/16] Core scheduling v6

Hi Vineeth,

On 30.06.20 23:32, Vineeth Remanan Pillai wrote:
> Sixth iteration of the Core-Scheduling feature.
> 
> Core scheduling is a feature that allows only trusted tasks to run
> concurrently on cpus sharing compute resources (eg: hyperthreads on a
> core). The goal is to mitigate the core-level side-channel attacks
> without requiring to disable SMT (which has a significant impact on
> performance in some situations). Core scheduling (as of v6) mitigates
> user-space to user-space attacks and user to kernel attack when one of
> the siblings enters the kernel via interrupts. It is still possible to
> have a task attack the sibling thread when it enters the kernel via
> syscalls.
> 
> By default, the feature doesn't change any of the current scheduler
> behavior. The user decides which tasks can run simultaneously on the
> same core (for now by having them in the same tagged cgroup). When a
> tag is enabled in a cgroup and a task from that cgroup is running on a
> hardware thread, the scheduler ensures that only idle or trusted tasks
> run on the other sibling(s). Besides security concerns, this feature
> can also be beneficial for RT and performance applications where we
> want to control how tasks make use of SMT dynamically.
> 
> This iteration is mostly a cleanup of v5 except for a major feature of
> pausing sibling when a cpu enters kernel via nmi/irq/softirq. Also
> introducing documentation and includes minor crash fixes.
> 
> One major cleanup was removing the hotplug support and related code.
> The hotplug related crashes were not documented and the fixes piled up
> over time leading to complex code. We were not able to reproduce the
> crashes in the limited testing done. But if they are reroducable, we
> don't want to hide them. We should document them and design better
> fixes if any.
> 
> In terms of performance, the results in this release are similar to
> v5. On a x86 system with N hardware threads:
> - if only N/2 hardware threads are busy, the performance is similar
>    between baseline, corescheduling and nosmt
> - if N hardware threads are busy with N different corescheduling
>    groups, the impact of corescheduling is similar to nosmt
> - if N hardware threads are busy and multiple active threads share the
>    same corescheduling cookie, they gain a performance improvement over
>    nosmt.
>    The specific performance impact depends on the workload, but for a
>    really busy database 12-vcpu VM (1 coresched tag) running on a 36
>    hardware threads NUMA node with 96 mostly idle neighbor VMs (each in
>    their own coresched tag), the performance drops by 54% with
>    corescheduling and drops by 90% with nosmt.
> 
> v6 is rebased on 5.7.6(a06eb423367e)
> https://github.com/digitalocean/linux-coresched/tree/coresched/v6-v5.7.y

As discussed during Linux Plumbers, here is a small repo with test 
scripts and applications that I've used to look at core scheduling 
unfairness:

   https://github.com/agraf/schedgaps

Please let me know if it's unclear how to use it or if you see issues in 
your environment.

Please also make sure to only run this on idle server class hardware. 
Notebooks will most definitely have too many uncontrollable sources of 
timing entropy to give sensible results.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879