linux-kernel - Re: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <447b491f-ae1a-85db-a862-0a2b999cd0d4@amd.com>
Date: Fri, 16 Feb 2024 13:46:41 -0600
From: Tom Lendacky <thomas.lendacky@....com>
To: "Reshetova, Elena" <elena.reshetova@...el.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "x86@...nel.org" <x86@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 "H. Peter Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>,
 "Williams, Dan J" <dan.j.williams@...el.com>,
 Michael Roth <michael.roth@....com>, Ashish Kalra <ashish.kalra@....com>,
 "Shutemov, Kirill" <kirill.shutemov@...el.com>,
 "Dong, Eddie" <eddie.dong@...el.com>,
 Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
Subject: Re: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM

On 2/12/24 04:40, Reshetova, Elena wrote:
>> This series adds SEV-SNP support for running Linux under an Secure VM
>> Service Module (SVSM) at a less privileged VM Privilege Level (VMPL).
>> By running at a less priviledged VMPL, the SVSM can be used to provide
>> services, e.g. a virtual TPM, for Linux within the SEV-SNP confidential
>> VM (CVM) rather than trust such services from the hypervisor.
>>
>> Currently, a Linux guest expects to run at the highest VMPL, VMPL0, and
>> there are certain SNP related operations that require that VMPL level.
>> Specifically, the PVALIDATE instruction and the RMPADJUST instruction
>> when setting the VMSA attribute of a page (used when starting APs).
>>
>> If Linux is to run at a less privileged VMPL, e.g. VMPL2, then it must
>> use an SVSM (which is running at VMPL0) to perform the operations that
>> it is no longer able to perform.
>>
>> How Linux interacts with and uses the SVSM is documented in the SVSM
>> specification [1] and the GHCB specification [2].
>>
>> This series introduces support to run Linux under an SVSM. It consists
>> of:
>>    - Detecting the presence of an SVSM
>>    - When not running at VMPL0, invoking the SVSM for page validation and
>>      VMSA page creation/deletion
>>    - Adding a sysfs entry that specifies the Linux VMPL
>>    - Modifying the sev-guest driver to use the VMPCK key associated with
>>      the Linux VMPL
>>    - Expanding the config-fs TSM support to request attestation reports
>>      from the SVSM
>>    - Detecting and allowing Linux to run in a VMPL other than 0 when an
>>      SVSM is present
> 
> Hi Tom and everyone,
> 
> This patch set imo is a good opportunity to start a wider discussion on
> SVSM-style confidential guests that we actually wanted to start anyhow
> because TDX will need smth similar in the future.
> So let me explain our thinking and try to align together here.
> 
> In addition to an existing notion of a Confidential Computing (CoCo) guest
> both Intel and AMD define a concept that a CoCo guest can be further
> subdivided/partitioned into different SW layers running with different
> privileges. In the AMD Secure Encrypted Virtualization with Secure Nested
> Paging (SEV-SNP) architecture this is called VM Permission Levels (VMPLs)
> and in the Intel Trust Domain Extensions (TDX) architecture it is called
> TDX Partitioning. The most privileged part of a CoCo guest is referred as
> running at VMPL0 for AMD SEV-SNP and as L1 for Intel TDX Partitioning.
> This privilege level has full control over the other components running
> inside a CoCo guest, as well as some operations are only allowed to be
> executed by the SW running at this privilege level. The assumption is that
> this level is used for a Virtual Machine Monitor (VMM)/Hypervisor like KVM
> and others or a lightweight Service Manager (SM) like coconut-SVSM [3].

I'm not sure what you mean about the level being used for a 
VMM/hypervisor, since they are running in the host. Coconut-SVSM is 
correct, since it is running within the guest context.

> The actual workload VM (together with its OS) is expected to be run in a
> different privilege level (!VMPL0 in AMD case and L2 layer in Intel case).
> Both architectures in our current understanding (please correct if this is
> not true for AMD) allow for different workload VM options starting from
> a fully unmodified legacy OS to a fully enabled/enlightened AMD SEV-SNP/
> Intel TDX guest and anything in between. However, each workload guest

I'm not sure about the "anything in between" aspect. I would think that if 
the guest is enlightened it would be fully enlightened or not at all. It 
would be difficult to try to decide what operations should be sent to the 
SVSM to handle, and how that would occur if the guest OS is unaware of the 
SVSM protocol to use. If it is aware of the protocol, then it would just 
use it.

For the unenlighted guest, it sounds like more of a para-visor approach 
being used where the guest wouldn't know that control was ever transferred 
to the para-visor to handle the event. With SNP, that would be done 
through a feature called Reflect-VC. But that means it is an all or 
nothing action.

> option requires a different level of implementation support from the most
> privileged VMPL0/L1 layer as well as from the workload OS itself (running
> at !VMPL0/L2) and also has different effects on overall performance and
> other factors. Linux as being one of the workload OSes currently doesn’t
> define a common notion or interfaces for such special type of CoCo guests
> and there is a risk that each vendor can duplicate a lot of common concepts
> inside ADM SEV-SNP or Intel TDX specific code. This is not the approach
> Linux usually prefers and the vendor agnostic solution should be explored first.
> 
> So this is an attempt to start a joint discussion on how/what/if we can unify
> in this space and following the recent lkml thread [1], it seems we need
> to first clarify how we see this special  !VMPL0/L2 guest and whenever we
> can or need to define a common notion for it.
> The following options are *theoretically* possible:
> 
> 1. Keep the !VMPL0/L2 guest as unmodified AMD SEV-SNP/Intel TDX guest
> and hide all complexity inside VMPL0/L1 VMM and/or respected Intel/AMD
> architecture internal components. This likely creates additional complexity
> in the implementation of VMPL0/L1 layer compared to other options below.
> This option also doesn’t allow service providers to unify their interfaces
> between AMD/Intel solutions, but requires their VMPL0/L1 layer to handle
> differences between these guests. On a plus side this option requires no
> changes in existing AMD SEV-SNP/Intel TDX Linux guest code to support
> !VMPL0/L2 guest. The big open question we have here to AMD folks is
> whenever it is architecturally feasible for you to support this case?

It is architecturally feasible to support this, but it would come with a 
performance penalty. For SNP, all #VC exceptions would be routed back to 
the HV, into the SVSM/para-visor to be processed, back to the HV and 
finally back the guest. While we would expect some operations, such as 
PVALIDATE, to have to make this kind of exchange, operations such as CPUID 
or MSR accesses would suffer.

> 
> 2. Keep it as Intel TDX/AMD SEV-SNP guest with some Linux guest internal
> code logic to handle whenever it runs in L1 vs L2/VMPL0 vs !VMPL0.
> This is essentially what this patch series is doing for AMD.
> This option potentially creates many if statements inside respected Linux
> implementation of these technologies to handle the differences, complicates
> the code, and doesn’t allow service providers to unify their L1/VMPL0 code.
> This option was also previously proposed for Intel TDX in this lkml thread [1]
> and got a negative initial reception.

I think the difference here is that the guest would still be identified as 
an SNP guest and still use all of the memory encryption and #VC handling 
it does today. It is just specific VMPL0-only operations that would need 
to performed by the SVSM instead of by the guest.

> 
> 3. Keep it as a legacy non-CoCo guest. This option is very bad from
> performance point of view since all I/O must be done via VMPL0/L1 layer
> and it is considered infeasible/unacceptable by service providers
> (performance of networking and disk is horrible).  It also requires an
> extensive implementation in VMPL0/L1 layer to support emulation of all devices.
> 
> 4. Define a new guest abstraction/guest type that would be used for
> !VMPL0/L2 guest. This allows in the future to define a unified L2 <-> L1/VMPL!0
> <-> VMPL0 communication interface that underneath would use Intel
> TDX/AMD SEV-SNP specified communication primitives. Out of existing Linux code,
> this approach is followed to some initial degree by MSFT Hyper-V implementation [2].
> It defines a new type of virtualized guest with its own initialization path and callbacks in
>   x86_platform.guest/hyper.*. However, in our understanding noone has yet
> attempted to define a unified abstraction for such guest, as well as unified interface.
> AMD SEV-SNP has defined in [4] a VMPL0 <--> !VMPL0 communication interface
>   which is AMD specific.

Can TDX create a new protocol within the SVSM that it could use?

Thanks,
Tom

> 
> 5. Anything else is missing?
> 
> References:
> 
> [1] https://lkml.org/lkml/2023/11/22/1089
> 
> [2] MSFT hyper-v implementation of AMD SEV-SNP !VMPL0 guest and TDX L2
> partitioning guest:
> https://elixir.bootlin.com/linux/latest/source/arch/x86/hyperv/ivm.c#L575
> 
> [3] https://github.com/coconut-svsm/svsm
> 
> [4] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/58019.pdf
> 
>