linux-kernel - Re: [RFC 1/2] x86/bugs: Disable coresched on hardware that does not need it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2fedf2d-9fad-2648-4c6a-1f3378f6d1b9@amazon.com>
Date:   Thu, 12 Nov 2020 21:01:47 +0100
From:   Alexander Graf <graf@...zon.com>
To:     Joel Fernandes <joel@...lfernandes.org>
CC:     Nishanth Aravamudan <naravamudan@...italocean.com>,
        Julien Desfossez <jdesfossez@...italocean.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Tim Chen" <tim.c.chen@...ux.intel.com>,
        Vineeth Pillai <viremana@...ux.microsoft.com>,
        Aaron Lu <aaron.lwe@...il.com>,
        Aubrey Li <aubrey.intel@...il.com>,
        Thomas Glexiner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        "Linus Torvalds" <torvalds@...ux-foundation.org>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        "Pawan Gupta" <pawan.kumar.gupta@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>, <vineeth@...byteword.org>,
        Chen Yu <yu.c.chen@...el.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Agata Gruza <agata.gruza@...el.com>,
        Antonio Gomez Iglesias <antonio.gomez.iglesias@...el.com>,
        <konrad.wilk@...cle.com>, Dario Faggioli <dfaggioli@...e.com>,
        Paul Turner <pjt@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Patrick Bellasi <derkling@...gle.com>,
        benbjiang(蒋彪) <benbjiang@...cent.com>,
        "Alexandre Chartre" <alexandre.chartre@...cle.com>,
        <James.Bottomley@...senpartnership.com>, <OWeisse@...ch.edu>,
        Dhaval Giani <dhaval.giani@...cle.com>,
        Junaid Shahid <junaids@...gle.com>,
        Jesse Barnes <jsbarnes@...gle.com>,
        "Hyser,Chris" <chris.hyser@...cle.com>,
        Ben Segall <bsegall@...gle.com>, Josh Don <joshdon@...gle.com>,
        Hao Luo <haoluo@...gle.com>,
        "Anand K. Mistry" <amistry@...gle.com>,
        Borislav Petkov <bp@...en8.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        "Dietmar Eggemann" <dietmar.eggemann@....com>,
        "H. Peter Anvin" <hpa@...or.com>, "Ingo Molnar" <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...e.de>, Mike Rapoport <rppt@...nel.org>,
        Tom Lendacky <thomas.lendacky@....com>,
        Tony Luck <tony.luck@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>
Subject: Re: [RFC 1/2] x86/bugs: Disable coresched on hardware that does not
 need it



On 12.11.20 16:28, Joel Fernandes wrote:
> 
> On Thu, Nov 12, 2020 at 03:52:32PM +0100, Alexander Graf wrote:
>>
>>
>> On 12.11.20 14:40, Joel Fernandes wrote:
>>>
>>> On Wed, Nov 11, 2020 at 11:29:37PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 11.11.20 23:15, Joel Fernandes wrote:
>>>>>
>>>>> On Wed, Nov 11, 2020 at 5:13 PM Joel Fernandes <joel@...lfernandes.org> wrote:
>>>>>>
>>>>>> On Wed, Nov 11, 2020 at 5:00 PM Alexander Graf <graf@...zon.com> wrote:
>>>>>>> On 11.11.20 22:14, Joel Fernandes wrote:
>>>>>>>>> Some hardware such as certain AMD variants don't have cross-HT MDS/L1TF
>>>>>>>>> issues. Detect this and don't enable core scheduling as it can
>>>>>>>>> needlessly slow the device done.
>>>>>>>>>
>>>>>>>>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>>>>>>>>> index dece79e4d1e9..0e6e61e49b23 100644
>>>>>>>>> --- a/arch/x86/kernel/cpu/bugs.c
>>>>>>>>> +++ b/arch/x86/kernel/cpu/bugs.c
>>>>>>>>> @@ -152,6 +152,14 @@ void __init check_bugs(void)
>>>>>>>>>      #endif
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> +/*
>>>>>>>>> + * Do not need core scheduling if CPU does not have MDS/L1TF vulnerability.
>>>>>>>>> + */
>>>>>>>>> +int arch_allow_core_sched(void)
>>>>>>>>> +{
>>>>>>>>> +       return boot_cpu_has_bug(X86_BUG_MDS) || boot_cpu_has_bug(X86_BUG_L1TF);
>>>>>>>
>>>>>>> Can we make this more generic and user settable, similar to the L1 cache
>>>>>>> flushing modes in KVM?
>>>>>>>
>>>>>>> I am not 100% convinced that there are no other thread sibling attacks
>>>>>>> possible without MDS and L1TF. If I'm paranoid, I want to still be able
>>>>>>> to force enable core scheduling.
>>>>>>>
>>>>>>> In addition, we are also using core scheduling as a poor man's mechanism
>>>>>>> to give customers consistent performance for virtual machine thread
>>>>>>> siblings. This is important irrespective of CPU bugs. In such a
>>>>>>> scenario, I want to force enable core scheduling.
>>>>>>
>>>>>> Ok,  I can make it new kernel command line option with:
>>>>>> coresched=on
>>>>>> coresched=secure (only if HW has MDS/L1TF)
>>>>>> coresched=off
>>>>>
>>>>> Also, I would keep "secure" as the default.  (And probably, we should
>>>>> modify the informational messages in sysfs to reflect this..)
>>>>
>>>> I agree that "secure" should be the default.
>>>
>>> Ok.
>>>
>>>> Can we also integrate into the "mitigations" kernel command line[1] for this?
>>>
>>> Sure, the integration into [1] sounds conceptually fine to me however it is
>>> not super straight forward. Like: What if user wants to force-enable
>>> core-scheduling for the usecase you mention, but still wants the cross-HT
>>> mitigation because they are only tagging VMs (as in your usecase) and not
>>> other tasks. Idk.
>>
>> Can we roll this backwards from what you would expect as a user? How about
>> we make this 2-dimensional?
>>
>>    coresched=[on|off|secure][,force]
>>
>> where "on" means "core scheduling can be done if colors are set", "off"
>> means "no core scheduling is done" and "secure" means "core scheduling can
>> be done on MDS or L1TF if colors are set".
> 
> So support for this force thing is not there ATM in the patchset. We can
> always incrementally add it later. I personally don't expect users to be Ok
> with tagging every single task as it is equivalent to disabling SMT and makes
> coresched useless.

It just flips the default from "always consider everything safe" to 
"always consider everything unsafe". Inside a cgroup, you can still set 
the same color to make use of siblings.

Either way, I agree that it can be a follow-up.

> 
>> The "force" option would then mean "apply a color to every new task".
>>
>> What then happens with mitigations= is easy. "auto" means
>> "coresched=secure". "off" means "coresched=off" and if you want to force
>> core scheduling for everything if necessary, you just do mitigations=auto
>> coresched=auto,force.
>>
>> Am I missing something obvious? :)
> 
> I guess I am confused for the following usage:
> mitigations=auto,nosmt coresched=secure
> 
> Note that auto,nosmt disables SMT selectively *only if needed*. Now, you add
> coresched=secure to the mix. Should auto,nosmt disable SMT or not? It should be
> disabled if the user did not tag anything (because system is insecure). It
> should be enabled, if they tagged things. So it really depends on user doing
> the right thing. And it is super confusing already -- I would just rather
> keep coresched= separate from mitigations= and document it properly. TBH-
> coresched does require system admin / designer to tag things as needed so why
> pretend that its easy to configure anyway? :)

coresched=secure still won't allow you to trust your system without 
thinking about it, while nosmt does. So I would say that nosmt does not 
imply anything for coresched (until ,force is available, then we're 
talking ...)

The main thing I'm interested in though is mitigations=off. When you 
know you only care about performance and not side channel security (HPC 
for example), then you can in general just set mitigations=off. That 
should definitely affect the core scheduling setting as well.


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879