linux-kernel - RE: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM8PR11MB575046453648C384A3D29020E7512@DM8PR11MB5750.namprd11.prod.outlook.com>
Date: Mon, 19 Feb 2024 17:54:39 +0000
From: "Reshetova, Elena" <elena.reshetova@...el.com>
To: Tom Lendacky <thomas.lendacky@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>, "Peter
 Zijlstra" <peterz@...radead.org>, "Williams, Dan J"
	<dan.j.williams@...el.com>, Michael Roth <michael.roth@....com>, Ashish Kalra
	<ashish.kalra@....com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"Dong, Eddie" <eddie.dong@...el.com>, Jeremi Piotrowski
	<jpiotrowski@...ux.microsoft.com>
Subject: RE: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM

> Subject: Re: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM
> 
> On 2/12/24 04:40, Reshetova, Elena wrote:
> >> This series adds SEV-SNP support for running Linux under an Secure VM
> >> Service Module (SVSM) at a less privileged VM Privilege Level (VMPL).
> >> By running at a less priviledged VMPL, the SVSM can be used to provide
> >> services, e.g. a virtual TPM, for Linux within the SEV-SNP confidential
> >> VM (CVM) rather than trust such services from the hypervisor.
> >>
> >> Currently, a Linux guest expects to run at the highest VMPL, VMPL0, and
> >> there are certain SNP related operations that require that VMPL level.
> >> Specifically, the PVALIDATE instruction and the RMPADJUST instruction
> >> when setting the VMSA attribute of a page (used when starting APs).
> >>
> >> If Linux is to run at a less privileged VMPL, e.g. VMPL2, then it must
> >> use an SVSM (which is running at VMPL0) to perform the operations that
> >> it is no longer able to perform.
> >>
> >> How Linux interacts with and uses the SVSM is documented in the SVSM
> >> specification [1] and the GHCB specification [2].
> >>
> >> This series introduces support to run Linux under an SVSM. It consists
> >> of:
> >>    - Detecting the presence of an SVSM
> >>    - When not running at VMPL0, invoking the SVSM for page validation and
> >>      VMSA page creation/deletion
> >>    - Adding a sysfs entry that specifies the Linux VMPL
> >>    - Modifying the sev-guest driver to use the VMPCK key associated with
> >>      the Linux VMPL
> >>    - Expanding the config-fs TSM support to request attestation reports
> >>      from the SVSM
> >>    - Detecting and allowing Linux to run in a VMPL other than 0 when an
> >>      SVSM is present
> >
> > Hi Tom and everyone,
> >
> > This patch set imo is a good opportunity to start a wider discussion on
> > SVSM-style confidential guests that we actually wanted to start anyhow
> > because TDX will need smth similar in the future.
> > So let me explain our thinking and try to align together here.
> >
> > In addition to an existing notion of a Confidential Computing (CoCo) guest
> > both Intel and AMD define a concept that a CoCo guest can be further
> > subdivided/partitioned into different SW layers running with different
> > privileges. In the AMD Secure Encrypted Virtualization with Secure Nested
> > Paging (SEV-SNP) architecture this is called VM Permission Levels (VMPLs)
> > and in the Intel Trust Domain Extensions (TDX) architecture it is called
> > TDX Partitioning. The most privileged part of a CoCo guest is referred as
> > running at VMPL0 for AMD SEV-SNP and as L1 for Intel TDX Partitioning.
> > This privilege level has full control over the other components running
> > inside a CoCo guest, as well as some operations are only allowed to be
> > executed by the SW running at this privilege level. The assumption is that
> > this level is used for a Virtual Machine Monitor (VMM)/Hypervisor like KVM
> > and others or a lightweight Service Manager (SM) like coconut-SVSM [3].
> 
> I'm not sure what you mean about the level being used for a
> VMM/hypervisor, since they are running in the host. Coconut-SVSM is
> correct, since it is running within the guest context.

What I meant is that this privilege level can be in principle used to host
any hypervisor/VMM also (not on the host, but in the guest).
For TDX we have pocs published in past that enabled
KVM running as L1 inside the guest. 

> 
> > The actual workload VM (together with its OS) is expected to be run in a
> > different privilege level (!VMPL0 in AMD case and L2 layer in Intel case).
> > Both architectures in our current understanding (please correct if this is
> > not true for AMD) allow for different workload VM options starting from
> > a fully unmodified legacy OS to a fully enabled/enlightened AMD SEV-SNP/
> > Intel TDX guest and anything in between. However, each workload guest
> 
> I'm not sure about the "anything in between" aspect. I would think that if
> the guest is enlightened it would be fully enlightened or not at all. It
> would be difficult to try to decide what operations should be sent to the
> SVSM to handle, and how that would occur if the guest OS is unaware of the
> SVSM protocol to use. If it is aware of the protocol, then it would just
> use it.

Architecturally we can support guests that would fall somewhere in between of
 a fully enlightened guest or legacy non-coco guest, albeit I am not saying it is a
way to go.  A minimally enlightened guest can ask for a service
from SVSM on some things (i.e. attestation evidence) but behave fully unenlightened
when it comes to other things (like handling MMIO - will be emulated by SVSM or
forwarded to the host). 

> 
> For the unenlighted guest, it sounds like more of a para-visor approach
> being used where the guest wouldn't know that control was ever transferred
> to the para-visor to handle the event. With SNP, that would be done
> through a feature called Reflect-VC. But that means it is an all or
> nothing action.


Thank you for the SEV insights. 

> 
> > option requires a different level of implementation support from the most
> > privileged VMPL0/L1 layer as well as from the workload OS itself (running
> > at !VMPL0/L2) and also has different effects on overall performance and
> > other factors. Linux as being one of the workload OSes currently doesn’t
> > define a common notion or interfaces for such special type of CoCo guests
> > and there is a risk that each vendor can duplicate a lot of common concepts
> > inside ADM SEV-SNP or Intel TDX specific code. This is not the approach
> > Linux usually prefers and the vendor agnostic solution should be explored first.
> >
> > So this is an attempt to start a joint discussion on how/what/if we can unify
> > in this space and following the recent lkml thread [1], it seems we need
> > to first clarify how we see this special  !VMPL0/L2 guest and whenever we
> > can or need to define a common notion for it.
> > The following options are *theoretically* possible:
> >
> > 1. Keep the !VMPL0/L2 guest as unmodified AMD SEV-SNP/Intel TDX guest
> > and hide all complexity inside VMPL0/L1 VMM and/or respected Intel/AMD
> > architecture internal components. This likely creates additional complexity
> > in the implementation of VMPL0/L1 layer compared to other options below.
> > This option also doesn’t allow service providers to unify their interfaces
> > between AMD/Intel solutions, but requires their VMPL0/L1 layer to handle
> > differences between these guests. On a plus side this option requires no
> > changes in existing AMD SEV-SNP/Intel TDX Linux guest code to support
> > !VMPL0/L2 guest. The big open question we have here to AMD folks is
> > whenever it is architecturally feasible for you to support this case?
> 
> It is architecturally feasible to support this, but it would come with a
> performance penalty. For SNP, all #VC exceptions would be routed back to
> the HV, into the SVSM/para-visor to be processed, back to the HV and
> finally back the guest. While we would expect some operations, such as
> PVALIDATE, to have to make this kind of exchange, operations such as CPUID
> or MSR accesses would suffer.

Sorry for my ignorance, what the HV? 

> 
> >
> > 2. Keep it as Intel TDX/AMD SEV-SNP guest with some Linux guest internal
> > code logic to handle whenever it runs in L1 vs L2/VMPL0 vs !VMPL0.
> > This is essentially what this patch series is doing for AMD.
> > This option potentially creates many if statements inside respected Linux
> > implementation of these technologies to handle the differences, complicates
> > the code, and doesn’t allow service providers to unify their L1/VMPL0 code.
> > This option was also previously proposed for Intel TDX in this lkml thread [1]
> > and got a negative initial reception.
> 
> I think the difference here is that the guest would still be identified as
> an SNP guest and still use all of the memory encryption and #VC handling
> it does today. It is just specific VMPL0-only operations that would need
> to performed by the SVSM instead of by the guest.

I see, you are saying less fragmentation overall, but overall I think this option
still reflects it also. 

> 
> >
> > 3. Keep it as a legacy non-CoCo guest. This option is very bad from
> > performance point of view since all I/O must be done via VMPL0/L1 layer
> > and it is considered infeasible/unacceptable by service providers
> > (performance of networking and disk is horrible).  It also requires an
> > extensive implementation in VMPL0/L1 layer to support emulation of all
> devices.
> >
> > 4. Define a new guest abstraction/guest type that would be used for
> > !VMPL0/L2 guest. This allows in the future to define a unified L2 <-> L1/VMPL!0
> > <-> VMPL0 communication interface that underneath would use Intel
> > TDX/AMD SEV-SNP specified communication primitives. Out of existing Linux
> code,
> > this approach is followed to some initial degree by MSFT Hyper-V
> implementation [2].
> > It defines a new type of virtualized guest with its own initialization path and
> callbacks in
> >   x86_platform.guest/hyper.*. However, in our understanding noone has yet
> > attempted to define a unified abstraction for such guest, as well as unified
> interface.
> > AMD SEV-SNP has defined in [4] a VMPL0 <--> !VMPL0 communication interface
> >   which is AMD specific.
> 
> Can TDX create a new protocol within the SVSM that it could use?

Kirill already commented on this, and the answer is of course we can, but imo we
need to see a bigger picture first. If we go with option 2 above, then coming with a
joint protocol is only limitedly useful because likely we wont be able to share the
code in the guest kernel. Ideally I think we want a common concept and a common
protocol that we can share in both guest kernel and coconut-svsm. 

Btw, is continuing discussion here the best/preferred/efficient way forward? Or should we
setup a call with anyone who is interested in the topic to form a joint understanding
on what can be done here? 

Best Regards,
Elena.


> 
> Thanks,
> Tom
> 
> >
> > 5. Anything else is missing?
> >
> > References:
> >
> > [1] https://lkml.org/lkml/2023/11/22/1089
> >
> > [2] MSFT hyper-v implementation of AMD SEV-SNP !VMPL0 guest and TDX L2
> > partitioning guest:
> > https://elixir.bootlin.com/linux/latest/source/arch/x86/hyperv/ivm.c#L575
> >
> > [3] https://github.com/coconut-svsm/svsm
> >
> > [4] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-
> docs/specifications/58019.pdf
> >
> >