[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<LV3PR12MB92653467FCBB04AB236DBB5B945A2@LV3PR12MB9265.namprd12.prod.outlook.com>
Date: Wed, 13 Nov 2024 14:49:16 +0000
From: "Kaplan, David" <David.Kaplan@....com>
To: "Manwaring, Derek" <derekmn@...zon.com>, "jackmanb@...gle.com"
<jackmanb@...gle.com>
CC: "bp@...en8.de" <bp@...en8.de>, "dave.hansen@...ux.intel.com"
<dave.hansen@...ux.intel.com>, "hpa@...or.com" <hpa@...or.com>,
"jpoimboe@...nel.org" <jpoimboe@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "mingo@...hat.com" <mingo@...hat.com>,
"pawan.kumar.gupta@...ux.intel.com" <pawan.kumar.gupta@...ux.intel.com>,
"peterz@...radead.org" <peterz@...radead.org>, "tglx@...utronix.de"
<tglx@...utronix.de>, "x86@...nel.org" <x86@...nel.org>, "mlipp@...zon.at"
<mlipp@...zon.at>, "canellac@...zon.at" <canellac@...zon.at>
Subject: RE: [PATCH v2 19/35] Documentation/x86: Document the new attack
vector controls
[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: Manwaring, Derek <derekmn@...zon.com>
> Sent: Tuesday, November 12, 2024 9:58 PM
> To: Kaplan, David <David.Kaplan@....com>; jackmanb@...gle.com
> Cc: bp@...en8.de; dave.hansen@...ux.intel.com; hpa@...or.com;
> jpoimboe@...nel.org; linux-kernel@...r.kernel.org; mingo@...hat.com;
> pawan.kumar.gupta@...ux.intel.com; peterz@...radead.org; tglx@...utronix.de;
> x86@...nel.org; mlipp@...zon.at; canellac@...zon.at
> Subject: RE: [PATCH v2 19/35] Documentation/x86: Document the new attack
> vector controls
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> +Brendan
>
> On 2024-11-06 at 14:49+0000, David Kaplan wrote:
> > On 2024-11-06 at 10:39+0000, Borislav Petkov wrote:
> > > One of the arguments against those getting merged is, those are not
> > > going to be
> > > *vector* controls anymore but something else:
> > >
> > > mitigate_user - that will mitigate everything that has to do with
> > > executing user processes
> > >
> > > mitigate_guest - same but when running guests
> > >
> > > The third one will be the SMT off: mitigate_cross_thread.
> >
> > Right, so the way I think of this is that there is a cognitive process
> > that administrators must go through:
> >
> > 1. Determine how the system will be used (e.g., am I running untrusted
> > VMs?)
> > 2. Determine the attack vectors relevant for that configuration (e.g.,
> >I
> > need guest->host and guest->guest protection) 3. Determine which
> >mitigations are required to enable the desired level
> > of security (e.g., enable vulnerability X mitigation but not Y)
> >
> > Today, the administrator must do all 3 of these, which requires
> > in-depth knowledge of all these bugs, and isn't forward compatible.
> > The proposed patch series has the kernel take care of step 3, but
> > still requires the administrator to do steps 1 and 2. The provided
> > documentation helps with step 2, but ultimately the admin must decide
> > which attack vectors they want to turn on/off. But the attack vectors
> > are also forward compatible in case new bugs show up in the future.
> >
> > What you've proposed is up-leveling things a bit further and trying to
> > have the kernel do both steps 2 and 3 in the above flow. That is, the
> > admin decides for example they have untrusted userspace, and the
> > kernel then determines they need user->kernel and user->user
> > protection, and then which bug fixes to enable.
> >
> > I'm not necessarily opposed to that, and welcome feedback on this.
> > But as you said, that is not an attack-vector control anymore, it is
> > more of an end-use control. It is possible to do both...we could also
> > create end-use options like the ones you mention, and just map those
> > in a pretty trivial way to the attack vector controls.
>
> I think the further simplification makes sense (merge to mitigate_user or
> mitigate_guest). I would say definitely don't do both (ending up with end-use, vector
> controls, *and* existing parameters). Both just seems like more confusion rather
> than simplification overall.
>
> For me the major dissonance in all of this remains cross_thread. Based on either
> approach (end-use or vector), SMT should be disabled unless the admin explicitly
> asks to keep it (presumably because they are running with core scheduling
> correctly configured).
Cross_thread is certainly a unique one. The philosophy Linux appears to have taken in general is to always mitigate these kinds of bugs by default, unless doing so requires disabling SMT. Others here may know the history better, but I presume that decision was made because of the performance impact of disabling SMT, and the fact that it would be highly disruptive to update your kernel and find half your cores have disappeared. Still, it creates an incomplete security story.
But you do raise an important point which is that the relevance of cross-thread protection is also dependent on the scheduling policy since these attacks require the victim and attacker to be running on sibling threads. If scheduling policy prohibits that, then disabling SMT is not required. But the kernel doesn't know if that will be adhered to. Hence why I think cross-thread has to be handled separately. It would have arguably made sense to disable SMT unless the admin asks to keep it, but that ship I think has sailed and this doesn't seem like something we can change now.
>
> What if mitigate_user_user defaulted to 'defaults' instead of 'on'? I'm thinking
> 'defaults' meaning "do the things the kernel normally did before thinking in these
> attack-vector terms." That way we could differentiate between "admin didn't specify
> anything" and "admin said they cared about mitigating this vector (or case)." That
> should make it reasonable to disable SMT when mitigate_user_user=on is supplied,
> yeah?
>
Hmm. I don't really like the name 'defaults', although I could envision something like 'partial' meaning do what we do today, while 'on' means disable SMT. But I do worry that if there are too many options that secretly disable SMT under the hood, it will be confusing for users. Plus you have the forward compatibility worry...the attack vectors are designed to be stable even as new bugs appear. I could imagine users today choosing to enable mitigate_user_user but if a new bug shows up in the future that requires disabling SMT, all of the sudden they lose half the cores overnight again.
Keeping the SMT disablement unique to the mitigate_cross_thread control I think makes it more obvious to users whether there is a chance SMT could get turned off.
Thanks
--David Kaplan
Powered by blists - more mailing lists