[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6789d559-cb57-24b5-2c4a-57c6d18133a2@oracle.com>
Date: Tue, 28 Apr 2020 19:01:45 +0300
From: Liran Alon <liran.alon@...cle.com>
To: Alexander Graf <graf@...zon.de>,
"Paraschiv, Andra-Irina" <andraprs@...zon.com>,
Paolo Bonzini <pbonzini@...hat.com>,
linux-kernel@...r.kernel.org
Cc: Anthony Liguori <aliguori@...zon.com>,
Benjamin Herrenschmidt <benh@...zon.com>,
Colm MacCarthaigh <colmmacc@...zon.com>,
Bjoern Doebel <doebel@...zon.de>,
David Woodhouse <dwmw@...zon.co.uk>,
Frank van der Linden <fllinden@...zon.com>,
Martin Pohlack <mpohlack@...zon.de>,
Matt Wilson <msw@...zon.com>, Balbir Singh <sblbir@...zon.com>,
Stewart Smith <trawets@...zon.com>,
Uwe Dannowski <uwed@...zon.de>, kvm@...r.kernel.org,
ne-devel-upstream@...zon.com
Subject: Re: [PATCH v1 00/15] Add support for Nitro Enclaves
On 28/04/2020 18:25, Alexander Graf wrote:
>
>
> On 27.04.20 13:44, Liran Alon wrote:
>>
>> On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:
>>>
>>> On 25/04/2020 18:25, Liran Alon wrote:
>>>>
>>>> On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:
>>>>>
>>>>> The memory and CPUs are carved out of the primary VM, they are
>>>>> dedicated for the enclave. The Nitro hypervisor running on the host
>>>>> ensures memory and CPU isolation between the primary VM and the
>>>>> enclave VM.
>>>> I hope you properly take into consideration Hyper-Threading
>>>> speculative side-channel vulnerabilities here.
>>>> i.e. Usually cloud providers designate each CPU core to be assigned
>>>> to run only vCPUs of specific guest. To avoid sharing a single CPU
>>>> core between multiple guests.
>>>> To handle this properly, you need to use some kind of core-scheduling
>>>> mechanism (Such that each CPU core either runs only vCPUs of enclave
>>>> or only vCPUs of primary VM at any given point in time).
>>>>
>>>> In addition, can you elaborate more on how the enclave memory is
>>>> carved out of the primary VM?
>>>> Does this involve performing a memory hot-unplug operation from
>>>> primary VM or just unmap enclave-assigned guest physical pages from
>>>> primary VM's SLAT (EPT/NPT) and map them now only in enclave's SLAT?
>>>
>>> Correct, we take into consideration the HT setup. The enclave gets
>>> dedicated physical cores. The primary VM and the enclave VM don't run
>>> on CPU siblings of a physical core.
>> The way I would imagine this to work is that Primary-VM just specifies
>> how many vCPUs will the Enclave-VM have and those vCPUs will be set with
>> affinity to run on same physical CPU cores as Primary-VM.
>> But with the exception that scheduler is modified to not run vCPUs of
>> Primary-VM and Enclave-VM as sibling on the same physical CPU core
>> (core-scheduling). i.e. This is different than primary-VM losing
>> physical CPU cores permanently as long as the Enclave-VM is running.
>> Or maybe this should even be controlled by a knob in virtual PCI device
>> interface to allow flexibility to customer to decide if Enclave-VM needs
>> dedicated CPU cores or is it ok to share them with Primary-VM
>> as long as core-scheduling is used to guarantee proper isolation.
>
> Running both parent and enclave on the same core can *potentially*
> lead to L2 cache leakage, so we decided not to go with it :).
Haven't thought about the L2 cache. Makes sense. Ack.
>
>>>
>>> Regarding the memory carve out, the logic includes page table entries
>>> handling.
>> As I thought. Thanks for conformation.
>>>
>>> IIRC, memory hot-unplug can be used for the memory blocks that were
>>> previously hot-plugged.
>>>
>>> https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/admin-guide/mm/memory-hotplug.html__;!!GqivPVa7Brio!MubgaBjJabDtNzNpdOxxbSKtLbqXHbsEpTtZ1mj-rnfLvMIbLW1nZ8cK10GhYJQ$
>>>
>>>
>>>>
>>>> I don't quite understand why Enclave VM needs to be
>>>> provisioned/teardown during primary VM's runtime.
>>>>
>>>> For example, an alternative could have been to just provision both
>>>> primary VM and Enclave VM on primary VM startup.
>>>> Then, wait for primary VM to setup a communication channel with
>>>> Enclave VM (E.g. via virtio-vsock).
>>>> Then, primary VM is free to request Enclave VM to perform various
>>>> tasks when required on the isolated environment.
>>>>
>>>> Such setup will mimic a common Enclave setup. Such as Microsoft
>>>> Windows VBS EPT-based Enclaves (That all runs on VTL1). It is also
>>>> similar to TEEs running on ARM TrustZone.
>>>> i.e. In my alternative proposed solution, the Enclave VM is similar
>>>> to VTL1/TrustZone.
>>>> It will also avoid requiring introducing a new PCI device and driver.
>>>
>>> True, this can be another option, to provision the primary VM and the
>>> enclave VM at launch time.
>>>
>>> In the proposed setup, the primary VM starts with the initial
>>> allocated resources (memory, CPUs). The launch path of the enclave VM,
>>> as it's spawned on the same host, is done via the ioctl interface -
>>> PCI device - host hypervisor path. Short-running or long-running
>>> enclave can be bootstrapped during primary VM lifetime. Depending on
>>> the use case, a custom set of resources (memory and CPUs) is set for
>>> an enclave and then given back when the enclave is terminated; these
>>> resources can be used for another enclave spawned later on or the
>>> primary VM tasks.
>>>
>> Yes, I already understood this is how the mechanism work. I'm
>> questioning whether this is indeed a good approach that should also be
>> taken by upstream.
>
> I thought the point of Linux was to support devices that exist, rather
> than change the way the world works around it? ;)
I agree. Just poking around to see if upstream wants to implement a
different approach for Enclaves, regardless of accepting the Nitro
Enclave virtual PCI driver for AWS use-case of course.
>
>> The use-case of using Nitro Enclaves is for a Confidential-Computing
>> service. i.e. The ability to provision a compute instance that can be
>> trusted to perform a bunch of computation on sensitive
>> information with high confidence that it cannot be compromised as it's
>> highly isolated. Some technologies such as Intel SGX and AMD SEV
>> attempted to achieve this even with guarantees that
>> the computation is isolated from the hardware and hypervisor itself.
>
> Yeah, that worked really well, didn't it? ;)
You haven't seen me saying SGX worked well. :)
AMD SEV though still have it's shot (Once SEV-SNP will be GA).
>
>> I would have expected that for the vast majority of real customer
>> use-cases, the customer will provision a compute instance that runs some
>> confidential-computing task in an enclave which it
>> keeps running for the entire life-time of the compute instance. As the
>> sole purpose of the compute instance is to just expose a service that
>> performs some confidential-computing task.
>> For those cases, it should have been sufficient to just pre-provision a
>> single Enclave-VM that performs this task, together with the compute
>> instance and connect them via virtio-vsock.
>> Without introducing any new virtual PCI device, guest PCI driver and
>> unique semantics of stealing resources (CPUs and Memory) from primary-VM
>> at runtime.
>
> You would also need to preprovision the image that runs in the
> enclave, which is usually only determined at runtime. For that you
> need the PCI driver anyway, so why not make the creation dynamic too?
The image doesn't have to be determined at runtime. It could be supplied
to control-plane. As mentioned below.
>
>> In this Nitro Enclave architecture, we de-facto put Compute
>> control-plane abilities in the hands of the guest VM. Instead of
>> introducing new control-plane primitives that allows building
>> the data-plane architecture desired by the customer in a flexible
>> manner.
>> * What if the customer prefers to have it's Enclave VM polling S3 bucket
>> for new tasks and produce results to S3 as-well? Without having any
>> "Primary-VM" or virtio-vsock connection of any kind?
>> * What if for some use-cases customer wants Enclave-VM to have dedicated
>> compute power (i.e. Not share physical CPU cores with primary-VM. Not
>> even with core-scheduling) but for other
>> use-cases, customer prefers to share physical CPU cores with Primary-VM
>> (Together with core-scheduling guarantees)? (Although this could be
>> addressed by extending the virtual PCI device
>> interface with a knob to control this)
>>
>> An alternative would have been to have the following new control-plane
>> primitives:
>> * Ability to provision a VM without boot-volume, but instead from an
>> Image that is used to boot from memory. Allowing to provision
>> disk-less VMs.
>> (E.g. Can be useful for other use-cases such as VMs not requiring EBS
>> at all which could allow cheaper compute instance)
>> * Ability to provision a group of VMs together as a group such that they
>> are guaranteed to launch as sibling VMs on the same host.
>> * Ability to create a fast-path connection between sibling VMs on the
>> same host with virtio-vsock. Or even also other shared-memory mechanism.
>> * Extend AWS Fargate with ability to run multiple microVMs as a group
>> (Similar to above) connected with virtio-vsock. To allow on-demand scale
>> of confidential-computing task.
>
> Yes, there are a *lot* of different ways to implement enclaves in a
> cloud environment. This is the one that we focused on, but I'm sure
> others in the space will have more ideas. It's definitely an
> interesting space and I'm eager to see more innovation happening :).
>
>> Having said that, I do see a similar architecture to Nitro Enclaves
>> virtual PCI device used for a different purpose: For hypervisor-based
>> security isolation (Such as Windows VBS).
>> E.g. Linux boot-loader can detect the presence of this virtual PCI
>> device and use it to provision multiple VM security domains. Such that
>> when a security domain is created,
>> it is specified what is the hardware resources it have access to (Guest
>> memory pages, IOPorts, MSRs and etc.) and the blob it should run to
>> bootstrap. Similar, but superior than,
>> Hyper-V VSM. In addition, some security domains will be given special
>> abilities to control other security domains (For example, to control the
>> +XS,+XU EPT bits of other security
>> domains to enforce code-integrity. Similar to Windows VBS HVCI). Just an
>> idea... :)
>
> Yes, absolutely! So much fun to be had :D
:)
-Liran
>
>
> Alex
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
Powered by blists - more mailing lists