lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14e68dd8-b2fa-496f-8dfc-a883ad8434f5@redhat.com>
Date: Tue, 28 May 2024 17:19:34 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Alexander Graf <graf@...zon.com>, Stefano Garzarella
 <sgarzare@...hat.com>, Alexander Graf <agraf@...raf.de>
Cc: Dorjoy Chowdhury <dorjoychy111@...il.com>,
 virtualization@...ts.linux.dev, kvm@...r.kernel.org, netdev@...r.kernel.org,
 stefanha@...hat.com
Subject: Re: How to implement message forwarding from one CID to another in
 vhost driver

On 5/27/24 09:54, Alexander Graf wrote:
> 
> On 27.05.24 09:08, Alexander Graf wrote:
>> Hey Stefano,
>>
>> On 23.05.24 10:45, Stefano Garzarella wrote:
>>> On Tue, May 21, 2024 at 08:50:22AM GMT, Alexander Graf wrote:
>>>> Howdy,
>>>>
>>>> On 20.05.24 14:44, Dorjoy Chowdhury wrote:
>>>>> Hey Stefano,
>>>>>
>>>>> Thanks for the reply.
>>>>>
>>>>>
>>>>> On Mon, May 20, 2024, 2:55 PM Stefano Garzarella 
>>>>> <sgarzare@...hat.com> wrote:
>>>>>> Hi Dorjoy,
>>>>>>
>>>>>> On Sat, May 18, 2024 at 04:17:38PM GMT, Dorjoy Chowdhury wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Hope you are doing well. I am working on adding AWS Nitro Enclave[1]
>>>>>>> emulation support in QEMU. Alexander Graf is mentoring me on this 
>>>>>>> work. A v1
>>>>>>> patch series has already been posted to the qemu-devel mailing 
>>>>>>> list[2].
>>>>>>>
>>>>>>> AWS nitro enclaves is an Amazon EC2[3] feature that allows 
>>>>>>> creating isolated
>>>>>>> execution environments, called enclaves, from Amazon EC2 
>>>>>>> instances, which are
>>>>>>> used for processing highly sensitive data. Enclaves have no 
>>>>>>> persistent storage
>>>>>>> and no external networking. The enclave VMs are based on 
>>>>>>> Firecracker microvm
>>>>>>> and have a vhost-vsock device for communication with the parent 
>>>>>>> EC2 instance
>>>>>>> that spawned it and a Nitro Secure Module (NSM) device for 
>>>>>>> cryptographic
>>>>>>> attestation. The parent instance VM always has CID 3 while the 
>>>>>>> enclave VM gets
>>>>>>> a dynamic CID. The enclave VMs can communicate with the parent 
>>>>>>> instance over
>>>>>>> various ports to CID 3, for example, the init process inside an 
>>>>>>> enclave sends a
>>>>>>> heartbeat to port 9000 upon boot, expecting a heartbeat reply, 
>>>>>>> letting the
>>>>>>> parent instance know that the enclave VM has successfully booted.
>>>>>>>
>>>>>>> The plan is to eventually make the nitro enclave emulation in 
>>>>>>> QEMU standalone
>>>>>>> i.e., without needing to run another VM with CID 3 with proper vsock
>>>>>> If you don't have to launch another VM, maybe we can avoid 
>>>>>> vhost-vsock
>>>>>> and emulate virtio-vsock in user-space, having complete control 
>>>>>> over the
>>>>>> behavior.
>>>>>>
>>>>>> So we could use this opportunity to implement virtio-vsock in QEMU 
>>>>>> [4]
>>>>>> or use vhost-user-vsock [5] and customize it somehow.
>>>>>> (Note: vhost-user-vsock already supports sibling communication, so 
>>>>>> maybe
>>>>>> with a few modifications it fits your case perfectly)
>>>>>>
>>>>>> [4] https://gitlab.com/qemu-project/qemu/-/issues/2095
>>>>>> [5] 
>>>>>> https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
>>>>>
>>>>>
>>>>> Thanks for letting me know. Right now I don't have a complete picture
>>>>> but I will look into them. Thank you.
>>>>>>
>>>>>>
>>>>>>> communication support. For this to work, one approach could be to 
>>>>>>> teach the
>>>>>>> vhost driver in kernel to forward CID 3 messages to another CID N
>>>>>> So in this case both CID 3 and N would be assigned to the same QEMU
>>>>>> process?
>>>>>
>>>>>
>>>>> CID N is assigned to the enclave VM. CID 3 was supposed to be the
>>>>> parent VM that spawns the enclave VM (this is how it is in AWS, where
>>>>> an EC2 instance VM spawns the enclave VM from inside it and that
>>>>> parent EC2 instance always has CID 3). But in the QEMU case as we
>>>>> don't want a parent VM (we want to run enclave VMs standalone) we
>>>>> would need to forward the CID 3 messages to host CID. I don't know if
>>>>> it means CID 3 and CID N is assigned to the same QEMU process. Sorry.
>>>>
>>>>
>>>> There are 2 use cases here:
>>>>
>>>> 1) Enclave wants to treat host as parent (default). In this scenario,
>>>> the "parent instance" that shows up as CID 3 in the Enclave doesn't
>>>> really exist. Instead, when the Enclave attempts to talk to CID 3, it
>>>> should really land on CID 0 (hypervisor). When the hypervisor tries to
>>>> connect to the Enclave on port X, it should look as if it originates
>>>> from CID 3, not CID 0.
>>>>
>>>> 2) Multiple parent VMs. Think of an actual cloud hosting scenario.
>>>> Here, we have multiple "parent instances". Each of them thinks it's
>>>> CID 3. Each can spawn an Enclave that talks to CID 3 and reach the
>>>> parent. For this case, I think implementing all of virtio-vsock in
>>>> user space is the best path forward. But in theory, you could also
>>>> swizzle CIDs to make random "real" CIDs appear as CID 3.
>>>>
>>>
>>> Thank you for clarifying the use cases!
>>>
>>> Also for case 1, vhost-vsock doesn't support CID 0, so in my opinion
>>> it's easier to go into user-space with vhost-user-vsock or the built-in
>>> device.
>>
>>
>> Sorry, I believe I meant CID 2. Effectively for case 1, when a process 
>> on the hypervisor listens on port 1234, it should be visible as 3:1234 
>> from the VM and when the hypervisor process connects to <VM CID>:1234, 
>> it should look as if that connection came from CID 3.
> 
> 
> Now that I'm thinking about my message again: What if we just introduce 
> a sysfs/sysctl file for vsock that indicates the "host CID" (default: 
> 2)? Users that want vhost-vsock to behave as if the host is CID 3 can 
> just write 3 to it.
> 
> It means we'd need to change all references to VMADDR_CID_HOST to 
> instead refer to a global variable that indicates the new "host CID". 
> It'd need some more careful massaging to not break number namespace 
> assumptions (<= CID_HOST no longer works), but the idea should fly.

Forwarding one or more ports of a given CID to CID 2 (the host) should 
be doable with a dummy vhost client that listens to CID 3, connects to 
CID 2 and send data back and forth.  Not hard enough to justify changing 
all references to VMADDR_CID_HOST (and also I am not sure if vsock 
supports network namespaces?  then the sysctl/sysfs way is not feasible 
because you cannot set it per-netns, can you?).  It also has the 
disadvantages that different QEMU instances are not insulated.

I think it's either that or implementing virtio-vsock in userspace 
(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999979@redhat.com/, 
search for "To connect host<->guest").

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ