[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2fedf2d-9fad-2648-4c6a-1f3378f6d1b9@amazon.com>
Date: Thu, 12 Nov 2020 21:01:47 +0100
From: Alexander Graf <graf@...zon.com>
To: Joel Fernandes <joel@...lfernandes.org>
CC: Nishanth Aravamudan <naravamudan@...italocean.com>,
Julien Desfossez <jdesfossez@...italocean.com>,
Peter Zijlstra <peterz@...radead.org>,
"Tim Chen" <tim.c.chen@...ux.intel.com>,
Vineeth Pillai <viremana@...ux.microsoft.com>,
Aaron Lu <aaron.lwe@...il.com>,
Aubrey Li <aubrey.intel@...il.com>,
Thomas Glexiner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
"Linus Torvalds" <torvalds@...ux-foundation.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Kees Cook <keescook@...omium.org>,
Greg Kerr <kerrnel@...gle.com>, Phil Auld <pauld@...hat.com>,
Valentin Schneider <valentin.schneider@....com>,
Mel Gorman <mgorman@...hsingularity.net>,
"Pawan Gupta" <pawan.kumar.gupta@...ux.intel.com>,
Paolo Bonzini <pbonzini@...hat.com>, <vineeth@...byteword.org>,
Chen Yu <yu.c.chen@...el.com>,
Christian Brauner <christian.brauner@...ntu.com>,
Agata Gruza <agata.gruza@...el.com>,
Antonio Gomez Iglesias <antonio.gomez.iglesias@...el.com>,
<konrad.wilk@...cle.com>, Dario Faggioli <dfaggioli@...e.com>,
Paul Turner <pjt@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>,
Patrick Bellasi <derkling@...gle.com>,
benbjiang(蒋彪) <benbjiang@...cent.com>,
"Alexandre Chartre" <alexandre.chartre@...cle.com>,
<James.Bottomley@...senpartnership.com>, <OWeisse@...ch.edu>,
Dhaval Giani <dhaval.giani@...cle.com>,
Junaid Shahid <junaids@...gle.com>,
Jesse Barnes <jsbarnes@...gle.com>,
"Hyser,Chris" <chris.hyser@...cle.com>,
Ben Segall <bsegall@...gle.com>, Josh Don <joshdon@...gle.com>,
Hao Luo <haoluo@...gle.com>,
"Anand K. Mistry" <amistry@...gle.com>,
Borislav Petkov <bp@...en8.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
"Dietmar Eggemann" <dietmar.eggemann@....com>,
"H. Peter Anvin" <hpa@...or.com>, "Ingo Molnar" <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...e.de>, Mike Rapoport <rppt@...nel.org>,
Tom Lendacky <thomas.lendacky@....com>,
Tony Luck <tony.luck@...el.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>
Subject: Re: [RFC 1/2] x86/bugs: Disable coresched on hardware that does not
need it
On 12.11.20 16:28, Joel Fernandes wrote:
>
> On Thu, Nov 12, 2020 at 03:52:32PM +0100, Alexander Graf wrote:
>>
>>
>> On 12.11.20 14:40, Joel Fernandes wrote:
>>>
>>> On Wed, Nov 11, 2020 at 11:29:37PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 11.11.20 23:15, Joel Fernandes wrote:
>>>>>
>>>>> On Wed, Nov 11, 2020 at 5:13 PM Joel Fernandes <joel@...lfernandes.org> wrote:
>>>>>>
>>>>>> On Wed, Nov 11, 2020 at 5:00 PM Alexander Graf <graf@...zon.com> wrote:
>>>>>>> On 11.11.20 22:14, Joel Fernandes wrote:
>>>>>>>>> Some hardware such as certain AMD variants don't have cross-HT MDS/L1TF
>>>>>>>>> issues. Detect this and don't enable core scheduling as it can
>>>>>>>>> needlessly slow the device done.
>>>>>>>>>
>>>>>>>>> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
>>>>>>>>> index dece79e4d1e9..0e6e61e49b23 100644
>>>>>>>>> --- a/arch/x86/kernel/cpu/bugs.c
>>>>>>>>> +++ b/arch/x86/kernel/cpu/bugs.c
>>>>>>>>> @@ -152,6 +152,14 @@ void __init check_bugs(void)
>>>>>>>>> #endif
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> +/*
>>>>>>>>> + * Do not need core scheduling if CPU does not have MDS/L1TF vulnerability.
>>>>>>>>> + */
>>>>>>>>> +int arch_allow_core_sched(void)
>>>>>>>>> +{
>>>>>>>>> + return boot_cpu_has_bug(X86_BUG_MDS) || boot_cpu_has_bug(X86_BUG_L1TF);
>>>>>>>
>>>>>>> Can we make this more generic and user settable, similar to the L1 cache
>>>>>>> flushing modes in KVM?
>>>>>>>
>>>>>>> I am not 100% convinced that there are no other thread sibling attacks
>>>>>>> possible without MDS and L1TF. If I'm paranoid, I want to still be able
>>>>>>> to force enable core scheduling.
>>>>>>>
>>>>>>> In addition, we are also using core scheduling as a poor man's mechanism
>>>>>>> to give customers consistent performance for virtual machine thread
>>>>>>> siblings. This is important irrespective of CPU bugs. In such a
>>>>>>> scenario, I want to force enable core scheduling.
>>>>>>
>>>>>> Ok, I can make it new kernel command line option with:
>>>>>> coresched=on
>>>>>> coresched=secure (only if HW has MDS/L1TF)
>>>>>> coresched=off
>>>>>
>>>>> Also, I would keep "secure" as the default. (And probably, we should
>>>>> modify the informational messages in sysfs to reflect this..)
>>>>
>>>> I agree that "secure" should be the default.
>>>
>>> Ok.
>>>
>>>> Can we also integrate into the "mitigations" kernel command line[1] for this?
>>>
>>> Sure, the integration into [1] sounds conceptually fine to me however it is
>>> not super straight forward. Like: What if user wants to force-enable
>>> core-scheduling for the usecase you mention, but still wants the cross-HT
>>> mitigation because they are only tagging VMs (as in your usecase) and not
>>> other tasks. Idk.
>>
>> Can we roll this backwards from what you would expect as a user? How about
>> we make this 2-dimensional?
>>
>> coresched=[on|off|secure][,force]
>>
>> where "on" means "core scheduling can be done if colors are set", "off"
>> means "no core scheduling is done" and "secure" means "core scheduling can
>> be done on MDS or L1TF if colors are set".
>
> So support for this force thing is not there ATM in the patchset. We can
> always incrementally add it later. I personally don't expect users to be Ok
> with tagging every single task as it is equivalent to disabling SMT and makes
> coresched useless.
It just flips the default from "always consider everything safe" to
"always consider everything unsafe". Inside a cgroup, you can still set
the same color to make use of siblings.
Either way, I agree that it can be a follow-up.
>
>> The "force" option would then mean "apply a color to every new task".
>>
>> What then happens with mitigations= is easy. "auto" means
>> "coresched=secure". "off" means "coresched=off" and if you want to force
>> core scheduling for everything if necessary, you just do mitigations=auto
>> coresched=auto,force.
>>
>> Am I missing something obvious? :)
>
> I guess I am confused for the following usage:
> mitigations=auto,nosmt coresched=secure
>
> Note that auto,nosmt disables SMT selectively *only if needed*. Now, you add
> coresched=secure to the mix. Should auto,nosmt disable SMT or not? It should be
> disabled if the user did not tag anything (because system is insecure). It
> should be enabled, if they tagged things. So it really depends on user doing
> the right thing. And it is super confusing already -- I would just rather
> keep coresched= separate from mitigations= and document it properly. TBH-
> coresched does require system admin / designer to tag things as needed so why
> pretend that its easy to configure anyway? :)
coresched=secure still won't allow you to trust your system without
thinking about it, while nosmt does. So I would say that nosmt does not
imply anything for coresched (until ,force is available, then we're
talking ...)
The main thing I'm interested in though is mitigations=off. When you
know you only care about performance and not side channel security (HPC
for example), then you can in general just set mitigations=off. That
should definitely affect the core scheduling setting as well.
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
Powered by blists - more mailing lists