netdev - Re: How to implement message forwarding from one CID to another in vhost driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3a62a9d1-5864-4f00-bcf0-2c64552ee90c@csgraf.de>
Date: Tue, 21 May 2024 08:50:22 +0300
From: Alexander Graf <agraf@...raf.de>
To: Dorjoy Chowdhury <dorjoychy111@...il.com>,
 Stefano Garzarella <sgarzare@...hat.com>
Cc: virtualization@...ts.linux.dev, kvm@...r.kernel.org,
 netdev@...r.kernel.org, Alexander Graf <graf@...zon.com>, stefanha@...hat.com
Subject: Re: How to implement message forwarding from one CID to another in
 vhost driver

Howdy,

On 20.05.24 14:44, Dorjoy Chowdhury wrote:
> Hey Stefano,
>
> Thanks for the reply.
>
>
> On Mon, May 20, 2024, 2:55 PM Stefano Garzarella <sgarzare@...hat.com> wrote:
>> Hi Dorjoy,
>>
>> On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>> Hi,
>>>
>>> Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>> emulation support in QEMU. Alexander Graf is mentoring me on this work. A v1
>>> patch series has already been posted to the qemu-devel mailing list[2].
>>>
>>> AWS nitro enclaves is an Amazon EC2[3] feature that allows creating isolated
>>> execution environments, called enclaves, from Amazon EC2 instances, which are
>>> used for processing highly sensitive data. Enclaves have no persistent storage
>>> and no external networking. The enclave VMs are based on Firecracker microvm
>>> and have a vhost-vsock device for communication with the parent EC2 instance
>>> that spawned it and a Nitro Secure Module (NSM) device for cryptographic
>>> attestation. The parent instance VM always has CID 3 while the enclave VM gets
>>> a dynamic CID. The enclave VMs can communicate with the parent instance over
>>> various ports to CID 3, for example, the init process inside an enclave sends a
>>> heartbeat to port 9000 upon boot, expecting a heartbeat reply, letting the
>>> parent instance know that the enclave VM has successfully booted.
>>>
>>> The plan is to eventually make the nitro enclave emulation in QEMU standalone
>>> i.e., without needing to run another VM with CID 3 with proper vsock
>> If you don't have to launch another VM, maybe we can avoid vhost-vsock
>> and emulate virtio-vsock in user-space, having complete control over the
>> behavior.
>>
>> So we could use this opportunity to implement virtio-vsock in QEMU [4]
>> or use vhost-user-vsock [5] and customize it somehow.
>> (Note: vhost-user-vsock already supports sibling communication, so maybe
>> with a few modifications it fits your case perfectly)
>>
>> [4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>> [5] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>
>
> Thanks for letting me know. Right now I don't have a complete picture
> but I will look into them. Thank you.
>>
>>
>>> communication support. For this to work, one approach could be to teach the
>>> vhost driver in kernel to forward CID 3 messages to another CID N
>> So in this case both CID 3 and N would be assigned to the same QEMU
>> process?
>
>
> CID N is assigned to the enclave VM. CID 3 was supposed to be the
> parent VM that spawns the enclave VM (this is how it is in AWS, where
> an EC2 instance VM spawns the enclave VM from inside it and that
> parent EC2 instance always has CID 3). But in the QEMU case as we
> don't want a parent VM (we want to run enclave VMs standalone) we
> would need to forward the CID 3 messages to host CID. I don't know if
> it means CID 3 and CID N is assigned to the same QEMU process. Sorry.


There are 2 use cases here:

1) Enclave wants to treat host as parent (default). In this scenario, 
the "parent instance" that shows up as CID 3 in the Enclave doesn't 
really exist. Instead, when the Enclave attempts to talk to CID 3, it 
should really land on CID 0 (hypervisor). When the hypervisor tries to 
connect to the Enclave on port X, it should look as if it originates 
from CID 3, not CID 0.

2) Multiple parent VMs. Think of an actual cloud hosting scenario. Here, 
we have multiple "parent instances". Each of them thinks it's CID 3. 
Each can spawn an Enclave that talks to CID 3 and reach the parent. For 
this case, I think implementing all of virtio-vsock in user space is the 
best path forward. But in theory, you could also swizzle CIDs to make 
random "real" CIDs appear as CID 3.


>
>> Do you have to allocate 2 separate virtio-vsock devices, one for the
>> parent and one for the enclave?
>
>
> If there is a parent VM, then I guess both parent and enclave VMs need
> virtio-vsock devices.
>
>>> (set to CID 2 for host) i.e., it patches CID from 3 to N on incoming messages
>>> and from N to 3 on responses. This will enable users of the
>> Will these messages have the VMADDR_FLAG_TO_HOST flag set?
>>
>> We don't support this in vhost-vsock yet, if supporting it helps, we
>> might, but we need to better understand how to avoid security issues, so
>> maybe each device needs to explicitly enable the feature and specify
>> from which CIDs it accepts packets.
>
>
> I don't know about the flag. So I don't know if it will be set. Sorry.


 From the guest's point of view, the parent (CID 3) is just another VM. 
Since Linux as of

  https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@amazon.com/#2594117

always sets VMADDR_FLAG_TO_HOST when local_CID > 0 && remote_CID > 0, I 
would say the message has the flag set.

How would you envision the host to implement the flag? Would the host 
allow user space to listen on any CID and hence receive the respective 
target connections? And wouldn't listening on CID 0 then mean you're 
effectively listening to "any" other CID? Thinking about that a bit 
more, that may be just what we need, yes :)


>
>
>>> nitro-enclave machine
>>> type in QEMU to run the necessary vsock server/clients in the host machine
>>> (some defaults can be implemented in QEMU as well, for example, sending a reply
>>> to the heartbeat) which will rid them of the cumbersome way of running another
>>> whole VM with CID 3. This way, users of nitro-enclave machine in QEMU, could
>>> potentially also run multiple enclaves with their messages for CID 3 forwarded
>>> to different CIDs which, in QEMU side, could then be specified using a new
>>> machine type option (parent-cid) if implemented. I guess in the QEMU side, this
>>> will be an ioctl call (or some other way) to indicate to the host kernel that
>>> the CID 3 messages need to be forwarded. Does this approach of
>> What if there is already a VM with CID = 3 in the system?
>
>
> Good question! I don't know what should happen in this case.


See case 2 above :). In a nutshell, I don't think it'd be legal to have 
a real CID 3 in that scenario.


>
>
>>> forwarding CID 3 messages to another CID sound good?
>> It seems too specific a case, if we can generalize it maybe we could
>> make this change, but we would like to avoid complicating vhost-vsock
>> and keep it as simple as possible to avoid then having to implement
>> firewalls, etc.
>>
>> So first I would see if vhost-user-vsock or the QEMU built-in device is
>> right for this use-case.
> Thanks you! I will check everything out and reach out if I need
> further guidance about what needs to be done. And sorry as I wasn't
> able to answer some of your questions.


As mentioned above, I think there is merit for both. I personally care a 
lot more for case 1 over case 2: We already have a working 
implementation of Nitro Enclaves in a Cloud setup. What is missing is a 
way to easily run a Nitro Enclave locally for development.


Alex