[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14e68dd8-b2fa-496f-8dfc-a883ad8434f5@redhat.com>
Date: Tue, 28 May 2024 17:19:34 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Alexander Graf <graf@...zon.com>, Stefano Garzarella
<sgarzare@...hat.com>, Alexander Graf <agraf@...raf.de>
Cc: Dorjoy Chowdhury <dorjoychy111@...il.com>,
virtualization@...ts.linux.dev, kvm@...r.kernel.org, netdev@...r.kernel.org,
stefanha@...hat.com
Subject: Re: How to implement message forwarding from one CID to another in
vhost driver
On 5/27/24 09:54, Alexander Graf wrote:
>
> On 27.05.24 09:08, Alexander Graf wrote:
>> Hey Stefano,
>>
>> On 23.05.24 10:45, Stefano Garzarella wrote:
>>> On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>>>> Howdy,
>>>>
>>>> On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>>>> Hey Stefano,
>>>>>
>>>>> Thanks for the reply.
>>>>>
>>>>>
>>>>> On Mon, May 20, 2024, 2:55 PM Stefano Garzarella
>>>>> <sgarzare@...hat.com> wrote:
>>>>>> Hi Dorjoy,
>>>>>>
>>>>>> On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>>>> emulation support in QEMU. Alexander Graf is mentoring me on this
>>>>>>> work. A v1
>>>>>>> patch series has already been posted to the qemu-devel mailing
>>>>>>> list[2].
>>>>>>>
>>>>>>> AWS nitro enclaves is an Amazon EC2[3] feature that allows
>>>>>>> creating isolated
>>>>>>> execution environments, called enclaves, from Amazon EC2
>>>>>>> instances, which are
>>>>>>> used for processing highly sensitive data. Enclaves have no
>>>>>>> persistent storage
>>>>>>> and no external networking. The enclave VMs are based on
>>>>>>> Firecracker microvm
>>>>>>> and have a vhost-vsock device for communication with the parent
>>>>>>> EC2 instance
>>>>>>> that spawned it and a Nitro Secure Module (NSM) device for
>>>>>>> cryptographic
>>>>>>> attestation. The parent instance VM always has CID 3 while the
>>>>>>> enclave VM gets
>>>>>>> a dynamic CID. The enclave VMs can communicate with the parent
>>>>>>> instance over
>>>>>>> various ports to CID 3, for example, the init process inside an
>>>>>>> enclave sends a
>>>>>>> heartbeat to port 9000 upon boot, expecting a heartbeat reply,
>>>>>>> letting the
>>>>>>> parent instance know that the enclave VM has successfully booted.
>>>>>>>
>>>>>>> The plan is to eventually make the nitro enclave emulation in
>>>>>>> QEMU standalone
>>>>>>> i.e., without needing to run another VM with CID 3 with proper vsock
>>>>>> If you don't have to launch another VM, maybe we can avoid
>>>>>> vhost-vsock
>>>>>> and emulate virtio-vsock in user-space, having complete control
>>>>>> over the
>>>>>> behavior.
>>>>>>
>>>>>> So we could use this opportunity to implement virtio-vsock in QEMU
>>>>>> [4]
>>>>>> or use vhost-user-vsock [5] and customize it somehow.
>>>>>> (Note: vhost-user-vsock already supports sibling communication, so
>>>>>> maybe
>>>>>> with a few modifications it fits your case perfectly)
>>>>>>
>>>>>> [4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>>>> [5]
>>>>>> https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>>>
>>>>>
>>>>> Thanks for letting me know. Right now I don't have a complete picture
>>>>> but I will look into them. Thank you.
>>>>>>
>>>>>>
>>>>>>> communication support. For this to work, one approach could be to
>>>>>>> teach the
>>>>>>> vhost driver in kernel to forward CID 3 messages to another CID N
>>>>>> So in this case both CID 3 and N would be assigned to the same QEMU
>>>>>> process?
>>>>>
>>>>>
>>>>> CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>>>> parent VM that spawns the enclave VM (this is how it is in AWS, where
>>>>> an EC2 instance VM spawns the enclave VM from inside it and that
>>>>> parent EC2 instance always has CID 3). But in the QEMU case as we
>>>>> don't want a parent VM (we want to run enclave VMs standalone) we
>>>>> would need to forward the CID 3 messages to host CID. I don't know if
>>>>> it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>>>
>>>>
>>>> There are 2 use cases here:
>>>>
>>>> 1) Enclave wants to treat host as parent (default). In this scenario,
>>>> the "parent instance" that shows up as CID 3 in the Enclave doesn't
>>>> really exist. Instead, when the Enclave attempts to talk to CID 3, it
>>>> should really land on CID 0 (hypervisor). When the hypervisor tries to
>>>> connect to the Enclave on port X, it should look as if it originates
>>>> from CID 3, not CID 0.
>>>>
>>>> 2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>>>> Here, we have multiple "parent instances". Each of them thinks it's
>>>> CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>>>> parent. For this case, I think implementing all of virtio-vsock in
>>>> user space is the best path forward. But in theory, you could also
>>>> swizzle CIDs to make random "real" CIDs appear as CID 3.
>>>>
>>>
>>> Thank you for clarifying the use cases!
>>>
>>> Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
>>> it's easier to go into user-space with vhost-user-vsock or the built-in
>>> device.
>>
>>
>> Sorry, I believe I meant CID 2. Effectively for case 1, when a process
>> on the hypervisor listens on port 1234, it should be visible as 3:1234
>> from the VM and when the hypervisor process connects to <VM CID>:1234,
>> it should look as if that connection came from CID 3.
>
>
> Now that I'm thinking about my message again: What if we just introduce
> a sysfs/sysctl file for vsock that indicates the "host CID" (default:
> 2)? Users that want vhost-vsock to behave as if the host is CID 3 can
> just write 3 to it.
>
> It means we'd need to change all references to VMADDR_CID_HOST to
> instead refer to a global variable that indicates the new "host CID".
> It'd need some more careful massaging to not break number namespace
> assumptions (<= CID_HOST no longer works), but the idea should fly.
Forwarding one or more ports of a given CID to CID 2 (the host) should
be doable with a dummy vhost client that listens to CID 3, connects to
CID 2 and send data back and forth. Not hard enough to justify changing
all references to VMADDR_CID_HOST (and also I am not sure if vsock
supports network namespaces? then the sysctl/sysfs way is not feasible
because you cannot set it per-netns, can you?). It also has the
disadvantages that different QEMU instances are not insulated.
I think it's either that or implementing virtio-vsock in userspace
(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
search for "To connect host<->guest").
Paolo
Powered by blists - more mailing lists