[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c5wziphzhyoqb2mwzd2rstpotjqr3zky6hrgysohwsum4wvgi7@qmboatooyddd>
Date: Tue, 28 May 2024 17:41:46 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Alexander Graf <graf@...zon.com>, Alexander Graf <agraf@...raf.de>,
Dorjoy Chowdhury <dorjoychy111@...il.com>, virtualization@...ts.linux.dev, kvm@...r.kernel.org,
netdev@...r.kernel.org, stefanha@...hat.com
Subject: Re: How to implement message forwarding from one CID to another in
vhost driver
On Tue, May 28, 2024 at 05:19:34PM GMT, Paolo Bonzini wrote:
>On 5/27/24 09:54, Alexander Graf wrote:
>>
>>On 27.05.24 09:08, Alexander Graf wrote:
>>>Hey Stefano,
>>>
>>>On 23.05.24 10:45, Stefano Garzarella wrote:
>>>>On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>>>>>Howdy,
>>>>>
>>>>>On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>>>>>Hey Stefano,
>>>>>>
>>>>>>Thanks for the reply.
>>>>>>
>>>>>>
>>>>>>On Mon, May 20, 2024, 2:55 PM Stefano Garzarella
>>>>>><sgarzare@...hat.com> wrote:
>>>>>>>Hi Dorjoy,
>>>>>>>
>>>>>>>On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>>Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>>>>>emulation support in QEMU. Alexander Graf is mentoring
>>>>>>>>me on this work. A v1
>>>>>>>>patch series has already been posted to the qemu-devel
>>>>>>>>mailing list[2].
>>>>>>>>
>>>>>>>>AWS nitro enclaves is an Amazon EC2[3] feature that
>>>>>>>>allows creating isolated
>>>>>>>>execution environments, called enclaves, from Amazon EC2
>>>>>>>>instances, which are
>>>>>>>>used for processing highly sensitive data. Enclaves have
>>>>>>>>no persistent storage
>>>>>>>>and no external networking. The enclave VMs are based on
>>>>>>>>Firecracker microvm
>>>>>>>>and have a vhost-vsock device for communication with the
>>>>>>>>parent EC2 instance
>>>>>>>>that spawned it and a Nitro Secure Module (NSM) device
>>>>>>>>for cryptographic
>>>>>>>>attestation. The parent instance VM always has CID 3
>>>>>>>>while the enclave VM gets
>>>>>>>>a dynamic CID. The enclave VMs can communicate with the
>>>>>>>>parent instance over
>>>>>>>>various ports to CID 3, for example, the init process
>>>>>>>>inside an enclave sends a
>>>>>>>>heartbeat to port 9000 upon boot, expecting a heartbeat
>>>>>>>>reply, letting the
>>>>>>>>parent instance know that the enclave VM has successfully booted.
>>>>>>>>
>>>>>>>>The plan is to eventually make the nitro enclave
>>>>>>>>emulation in QEMU standalone
>>>>>>>>i.e., without needing to run another VM with CID 3 with proper vsock
>>>>>>>If you don't have to launch another VM, maybe we can avoid
>>>>>>>vhost-vsock
>>>>>>>and emulate virtio-vsock in user-space, having complete
>>>>>>>control over the
>>>>>>>behavior.
>>>>>>>
>>>>>>>So we could use this opportunity to implement virtio-vsock
>>>>>>>in QEMU [4]
>>>>>>>or use vhost-user-vsock [5] and customize it somehow.
>>>>>>>(Note: vhost-user-vsock already supports sibling
>>>>>>>communication, so maybe
>>>>>>>with a few modifications it fits your case perfectly)
>>>>>>>
>>>>>>>[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>>>>>[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>>>>
>>>>>>
>>>>>>Thanks for letting me know. Right now I don't have a complete picture
>>>>>>but I will look into them. Thank you.
>>>>>>>
>>>>>>>
>>>>>>>>communication support. For this to work, one approach
>>>>>>>>could be to teach the
>>>>>>>>vhost driver in kernel to forward CID 3 messages to another CID N
>>>>>>>So in this case both CID 3 and N would be assigned to the same QEMU
>>>>>>>process?
>>>>>>
>>>>>>
>>>>>>CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>>>>>parent VM that spawns the enclave VM (this is how it is in AWS, where
>>>>>>an EC2 instance VM spawns the enclave VM from inside it and that
>>>>>>parent EC2 instance always has CID 3). But in the QEMU case as we
>>>>>>don't want a parent VM (we want to run enclave VMs standalone) we
>>>>>>would need to forward the CID 3 messages to host CID. I don't know if
>>>>>>it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>>>>
>>>>>
>>>>>There are 2 use cases here:
>>>>>
>>>>>1) Enclave wants to treat host as parent (default). In this scenario,
>>>>>the "parent instance" that shows up as CID 3 in the Enclave doesn't
>>>>>really exist. Instead, when the Enclave attempts to talk to CID 3, it
>>>>>should really land on CID 0 (hypervisor). When the hypervisor tries to
>>>>>connect to the Enclave on port X, it should look as if it originates
>>>>>from CID 3, not CID 0.
>>>>>
>>>>>2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>>>>>Here, we have multiple "parent instances". Each of them thinks it's
>>>>>CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>>>>>parent. For this case, I think implementing all of virtio-vsock in
>>>>>user space is the best path forward. But in theory, you could also
>>>>>swizzle CIDs to make random "real" CIDs appear as CID 3.
>>>>>
>>>>
>>>>Thank you for clarifying the use cases!
>>>>
>>>>Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
>>>>it's easier to go into user-space with vhost-user-vsock or the built-in
>>>>device.
>>>
>>>
>>>Sorry, I believe I meant CID 2. Effectively for case 1, when a
>>>process on the hypervisor listens on port 1234, it should be
>>>visible as 3:1234 from the VM and when the hypervisor process
>>>connects to <VM CID>:1234, it should look as if that connection
>>>came from CID 3.
>>
>>
>>Now that I'm thinking about my message again: What if we just
>>introduce a sysfs/sysctl file for vsock that indicates the "host
>>CID" (default: 2)? Users that want vhost-vsock to behave as if the
>>host is CID 3 can just write 3 to it.
>>
>>It means we'd need to change all references to VMADDR_CID_HOST to
>>instead refer to a global variable that indicates the new "host
>>CID". It'd need some more careful massaging to not break number
>>namespace assumptions (<= CID_HOST no longer works), but the idea
>>should fly.
>
>Forwarding one or more ports of a given CID to CID 2 (the host) should
>be doable with a dummy vhost client that listens to CID 3, connects to
>CID 2 and send data back and forth.
Good idea, a kind of socat but that can handle /dev/vhost-vsock. With
rust-vmm crates it should be doable, but I think we always need to
extend vhost-vsock to support VMADDR_FLAG_TO_HOST, because for now it
does not allow guests to send packets to the host with destinatation
other than 2.
>Not hard enough to justify changing all references to VMADDR_CID_HOST
I agree.
>(and also I am not sure if vsock supports network namespaces?
nope, I had been working on it, but I could never finish it :-(
Tracking the work here: https://gitlab.com/vsock/vsock/-/issues/2
>then the sysctl/sysfs way is not feasible because you cannot set it
>per-netns, can you?). It also has the disadvantages that different
>QEMU instances are not insulated.
>
>I think it's either that or implementing virtio-vsock in userspace (https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/,
>search for "To connect host<->guest").
For in this case AF_VSOCK can't be used in the host, right?
So it's similar to vhost-user-vsock.
Thanks,
Stefano
Powered by blists - more mailing lists