linux-kernel - RE: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM8PR11MB575087DCCB069C723BBE1B47E7482@DM8PR11MB5750.namprd11.prod.outlook.com>
Date: Mon, 12 Feb 2024 10:40:26 +0000
From: "Reshetova, Elena" <elena.reshetova@...el.com>
To: Tom Lendacky <thomas.lendacky@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>, "Peter
 Zijlstra" <peterz@...radead.org>, "Williams, Dan J"
	<dan.j.williams@...el.com>, Michael Roth <michael.roth@....com>, Ashish Kalra
	<ashish.kalra@....com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"Dong, Eddie" <eddie.dong@...el.com>, Jeremi Piotrowski
	<jpiotrowski@...ux.microsoft.com>
Subject: RE: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM

> This series adds SEV-SNP support for running Linux under an Secure VM
> Service Module (SVSM) at a less privileged VM Privilege Level (VMPL).
> By running at a less priviledged VMPL, the SVSM can be used to provide
> services, e.g. a virtual TPM, for Linux within the SEV-SNP confidential
> VM (CVM) rather than trust such services from the hypervisor.
> 
> Currently, a Linux guest expects to run at the highest VMPL, VMPL0, and
> there are certain SNP related operations that require that VMPL level.
> Specifically, the PVALIDATE instruction and the RMPADJUST instruction
> when setting the VMSA attribute of a page (used when starting APs).
> 
> If Linux is to run at a less privileged VMPL, e.g. VMPL2, then it must
> use an SVSM (which is running at VMPL0) to perform the operations that
> it is no longer able to perform.
> 
> How Linux interacts with and uses the SVSM is documented in the SVSM
> specification [1] and the GHCB specification [2].
> 
> This series introduces support to run Linux under an SVSM. It consists
> of:
>   - Detecting the presence of an SVSM
>   - When not running at VMPL0, invoking the SVSM for page validation and
>     VMSA page creation/deletion
>   - Adding a sysfs entry that specifies the Linux VMPL
>   - Modifying the sev-guest driver to use the VMPCK key associated with
>     the Linux VMPL
>   - Expanding the config-fs TSM support to request attestation reports
>     from the SVSM
>   - Detecting and allowing Linux to run in a VMPL other than 0 when an
>     SVSM is present

Hi Tom and everyone, 

This patch set imo is a good opportunity to start a wider discussion on 
SVSM-style confidential guests that we actually wanted to start anyhow
because TDX will need smth similar in the future.
So let me explain our thinking and try to align together here. 

In addition to an existing notion of a Confidential Computing (CoCo) guest
both Intel and AMD define a concept that a CoCo guest can be further
subdivided/partitioned into different SW layers running with different
privileges. In the AMD Secure Encrypted Virtualization with Secure Nested
Paging (SEV-SNP) architecture this is called VM Permission Levels (VMPLs)
and in the Intel Trust Domain Extensions (TDX) architecture it is called
TDX Partitioning. The most privileged part of a CoCo guest is referred as
running at VMPL0 for AMD SEV-SNP and as L1 for Intel TDX Partitioning.
This privilege level has full control over the other components running
inside a CoCo guest, as well as some operations are only allowed to be
executed by the SW running at this privilege level. The assumption is that
this level is used for a Virtual Machine Monitor (VMM)/Hypervisor like KVM
and others or a lightweight Service Manager (SM) like coconut-SVSM [3].
The actual workload VM (together with its OS) is expected to be run in a
different privilege level (!VMPL0 in AMD case and L2 layer in Intel case).
Both architectures in our current understanding (please correct if this is
not true for AMD) allow for different workload VM options starting from
a fully unmodified legacy OS to a fully enabled/enlightened AMD SEV-SNP/
Intel TDX guest and anything in between. However, each workload guest
option requires a different level of implementation support from the most
privileged VMPL0/L1 layer as well as from the workload OS itself (running
at !VMPL0/L2) and also has different effects on overall performance and
other factors. Linux as being one of the workload OSes currently doesn’t
define a common notion or interfaces for such special type of CoCo guests
and there is a risk that each vendor can duplicate a lot of common concepts
inside ADM SEV-SNP or Intel TDX specific code. This is not the approach
Linux usually prefers and the vendor agnostic solution should be explored first.  

So this is an attempt to start a joint discussion on how/what/if we can unify
in this space and following the recent lkml thread [1], it seems we need
to first clarify how we see this special  !VMPL0/L2 guest and whenever we
can or need to define a common notion for it. 
The following options are *theoretically* possible:

1. Keep the !VMPL0/L2 guest as unmodified AMD SEV-SNP/Intel TDX guest
and hide all complexity inside VMPL0/L1 VMM and/or respected Intel/AMD
architecture internal components. This likely creates additional complexity
in the implementation of VMPL0/L1 layer compared to other options below.
This option also doesn’t allow service providers to unify their interfaces
between AMD/Intel solutions, but requires their VMPL0/L1 layer to handle
differences between these guests. On a plus side this option requires no
changes in existing AMD SEV-SNP/Intel TDX Linux guest code to support
!VMPL0/L2 guest. The big open question we have here to AMD folks is
whenever it is architecturally feasible for you to support this case?  

2. Keep it as Intel TDX/AMD SEV-SNP guest with some Linux guest internal
code logic to handle whenever it runs in L1 vs L2/VMPL0 vs !VMPL0.
This is essentially what this patch series is doing for AMD. 
This option potentially creates many if statements inside respected Linux
implementation of these technologies to handle the differences, complicates
the code, and doesn’t allow service providers to unify their L1/VMPL0 code.
This option was also previously proposed for Intel TDX in this lkml thread [1]
and got a negative initial reception. 

3. Keep it as a legacy non-CoCo guest. This option is very bad from
performance point of view since all I/O must be done via VMPL0/L1 layer
and it is considered infeasible/unacceptable by service providers
(performance of networking and disk is horrible).  It also requires an
extensive implementation in VMPL0/L1 layer to support emulation of all devices.   

4. Define a new guest abstraction/guest type that would be used for
!VMPL0/L2 guest. This allows in the future to define a unified L2 <-> L1/VMPL!0
<-> VMPL0 communication interface that underneath would use Intel
TDX/AMD SEV-SNP specified communication primitives. Out of existing Linux code,
this approach is followed to some initial degree by MSFT Hyper-V implementation [2].
It defines a new type of virtualized guest with its own initialization path and callbacks in
 x86_platform.guest/hyper.*. However, in our understanding noone has yet
attempted to define a unified abstraction for such guest, as well as unified interface.
AMD SEV-SNP has defined in [4] a VMPL0 <--> !VMPL0 communication interface
 which is AMD specific.  

5. Anything else is missing?  

References:

[1] https://lkml.org/lkml/2023/11/22/1089 

[2] MSFT hyper-v implementation of AMD SEV-SNP !VMPL0 guest and TDX L2
partitioning guest:
https://elixir.bootlin.com/linux/latest/source/arch/x86/hyperv/ivm.c#L575 

[3] https://github.com/coconut-svsm/svsm  

[4] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/58019.pdf