[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tg7l3dr2x7jbf4kf6fbl5vforfonx6b4ls7smrimq6fg4tlluc@udyk6ex7ymrr>
Date: Tue, 28 May 2024 16:43:40 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Alexander Graf <graf@...zon.com>
Cc: Alexander Graf <agraf@...raf.de>,
Dorjoy Chowdhury <dorjoychy111@...il.com>, virtualization@...ts.linux.dev, kvm@...r.kernel.org,
netdev@...r.kernel.org, stefanha@...hat.com
Subject: Re: How to implement message forwarding from one CID to another in
vhost driver
On Mon, May 27, 2024 at 09:54:17AM GMT, Alexander Graf wrote:
>
>On 27.05.24 09:08, Alexander Graf wrote:
>>Hey Stefano,
>>
>>On 23.05.24 10:45, Stefano Garzarella wrote:
>>>On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>>>>Howdy,
>>>>
>>>>On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>>>>Hey Stefano,
>>>>>
>>>>>Thanks for the reply.
>>>>>
>>>>>
>>>>>On Mon, May 20, 2024, 2:55 PM Stefano Garzarella
>>>>><sgarzare@...hat.com> wrote:
>>>>>>Hi Dorjoy,
>>>>>>
>>>>>>On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>>>>Hi,
>>>>>>>
>>>>>>>Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>>>>emulation support in QEMU. Alexander Graf is mentoring me
>>>>>>>on this work. A v1
>>>>>>>patch series has already been posted to the qemu-devel
>>>>>>>mailing list[2].
>>>>>>>
>>>>>>>AWS nitro enclaves is an Amazon EC2[3] feature that allows
>>>>>>>creating isolated
>>>>>>>execution environments, called enclaves, from Amazon EC2
>>>>>>>instances, which are
>>>>>>>used for processing highly sensitive data. Enclaves have
>>>>>>>no persistent storage
>>>>>>>and no external networking. The enclave VMs are based on
>>>>>>>Firecracker microvm
>>>>>>>and have a vhost-vsock device for communication with the
>>>>>>>parent EC2 instance
>>>>>>>that spawned it and a Nitro Secure Module (NSM) device for
>>>>>>>cryptographic
>>>>>>>attestation. The parent instance VM always has CID 3 while
>>>>>>>the enclave VM gets
>>>>>>>a dynamic CID. The enclave VMs can communicate with the
>>>>>>>parent instance over
>>>>>>>various ports to CID 3, for example, the init process
>>>>>>>inside an enclave sends a
>>>>>>>heartbeat to port 9000 upon boot, expecting a heartbeat
>>>>>>>reply, letting the
>>>>>>>parent instance know that the enclave VM has successfully booted.
>>>>>>>
>>>>>>>The plan is to eventually make the nitro enclave emulation
>>>>>>>in QEMU standalone
>>>>>>>i.e., without needing to run another VM with CID 3 with proper vsock
>>>>>>If you don't have to launch another VM, maybe we can avoid
>>>>>>vhost-vsock
>>>>>>and emulate virtio-vsock in user-space, having complete
>>>>>>control over the
>>>>>>behavior.
>>>>>>
>>>>>>So we could use this opportunity to implement virtio-vsock
>>>>>>in QEMU [4]
>>>>>>or use vhost-user-vsock [5] and customize it somehow.
>>>>>>(Note: vhost-user-vsock already supports sibling
>>>>>>communication, so maybe
>>>>>>with a few modifications it fits your case perfectly)
>>>>>>
>>>>>>[4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>>>>[5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>>>
>>>>>
>>>>>Thanks for letting me know. Right now I don't have a complete picture
>>>>>but I will look into them. Thank you.
>>>>>>
>>>>>>
>>>>>>>communication support. For this to work, one approach
>>>>>>>could be to teach the
>>>>>>>vhost driver in kernel to forward CID 3 messages to another CID N
>>>>>>So in this case both CID 3 and N would be assigned to the same QEMU
>>>>>>process?
>>>>>
>>>>>
>>>>>CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>>>>parent VM that spawns the enclave VM (this is how it is in AWS, where
>>>>>an EC2 instance VM spawns the enclave VM from inside it and that
>>>>>parent EC2 instance always has CID 3). But in the QEMU case as we
>>>>>don't want a parent VM (we want to run enclave VMs standalone) we
>>>>>would need to forward the CID 3 messages to host CID. I don't know if
>>>>>it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>>>
>>>>
>>>>There are 2 use cases here:
>>>>
>>>>1) Enclave wants to treat host as parent (default). In this
>>>>scenario,
>>>>the "parent instance" that shows up as CID 3 in the Enclave doesn't
>>>>really exist. Instead, when the Enclave attempts to talk to CID 3, it
>>>>should really land on CID 0 (hypervisor). When the hypervisor tries to
>>>>connect to the Enclave on port X, it should look as if it originates
>>>>from CID 3, not CID 0.
>>>>
>>>>2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>>>>Here, we have multiple "parent instances". Each of them thinks it's
>>>>CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>>>>parent. For this case, I think implementing all of virtio-vsock in
>>>>user space is the best path forward. But in theory, you could also
>>>>swizzle CIDs to make random "real" CIDs appear as CID 3.
>>>>
>>>
>>>Thank you for clarifying the use cases!
>>>
>>>Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
>>>it's easier to go into user-space with vhost-user-vsock or the built-in
>>>device.
>>
>>
>>Sorry, I believe I meant CID 2. Effectively for case 1, when a
>>process on the hypervisor listens on port 1234, it should be visible
>>as 3:1234 from the VM and when the hypervisor process connects to
>><VM CID>:1234, it should look as if that connection came from CID 3.
>
>
>Now that I'm thinking about my message again: What if we just introduce
>a sysfs/sysctl file for vsock that indicates the "host CID" (default:
>2)? Users that want vhost-vsock to behave as if the host is CID 3 can
>just write 3 to it.
I don't know if I understand the final use case well, so I'll try to
summarize it:
what you would like is to have the ability to receive/send messages from
the host to a guest as if it were a sibling VM, so as if it had a CID
!=2 (in your case 3). The important point is to use AF_VSOCK in the host
application, so no a unix-socket like firecracker.
Is this correct?
I thought you were using firecracker for this scenario, so it seemed to
make sense to expect user applications to support hybrid vsock.
>
>It means we'd need to change all references to VMADDR_CID_HOST to
>instead refer to a global variable that indicates the new "host CID".
>It'd need some more careful massaging to not break number namespace
>assumptions (<= CID_HOST no longer works), but the idea should fly.
>
>That would give us all 3 options:
>
>1) User sets vsock.host_cid = 3 to simulate that the host is in
>reality an enclave parent
>2) User spawns VM with CID = 3 to run parent payload inside
>3) User spawns parent and enclave VMs with vhost-vsock-user which
>creates its own CID namespace
>
>
>Stefano, WDYT?
This would require many changes in the af_vsock core as well. Perhaps we
can avoid touching the core in this way:
1. extend vhost-vsock to support VMADDR_FLAG_TO_HOST (this is need also
when the user spawns a VM with CID = 3 using vhost-vsock).
Some new ioctl/sysfs should be needed to create an allowlist of CIDs
that may or may not be accepted. (note: as now, vhost-vsock discards
all packets that have dst_cid != 2)
2. create a new G2H transport that will be loaded in the host.
af_vsock core supports 3 transport types to be loaded at runtime
simultaneously: looback, G2H (e.g. virtio-vsock, hyper-v, vmci
driver), H2G (e.g. vhost-vsock kernel module). We originally
introduced this extension to support nested VMs. This split is used
mostly to handle CIDs:
- loopback (local CID = 1)
- H2G (local CID = 2)
- G2H (local CID > 2)
Perhaps the simplest thing is to extend vsock_loopback to be used
here, but instead of registering as loopback (which can only handle
CID 1), it should register as G2H, this way we reuse all the logic
already in the af_vsock core to handle CIDs > 2.
The only problem is that in this case your host, it can't be nested.
But upstream there's a proposal to support multiple virtio-vsock
devices in a guest, so we could adapt it to support this case in the
future.
WDYT?
Thanks,
Stefano
Powered by blists - more mailing lists