netdev - Re: How to implement message forwarding from one CID to another in vhost driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <nojtsdora7chbhnblvygozoa4qui3ghivndvg5ixbsgebos4hg@e2jldxpf7sum>
Date: Tue, 2 Jul 2024 14:05:37 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Dorjoy Chowdhury <dorjoychy111@...il.com>
Cc: Alexander Graf <graf@...zon.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	Alexander Graf <agraf@...raf.de>, virtualization@...ts.linux.dev, kvm@...r.kernel.org, 
	netdev@...r.kernel.org, stefanha@...hat.com
Subject: Re: How to implement message forwarding from one CID to another in
 vhost driver

On Sun, Jun 30, 2024 at 04:54:18PM GMT, Dorjoy Chowdhury wrote:
>Hey Stefano,
>Apart from my questions in my previous email, I have some others as well.
>
>If the vhost-device-vsock modification to forward packets to
>VMADDR_CID_LOCAL is implemented, does the VMADDR_FLAG_TO_HOST need to
>be set by any application in the guest? I understand that the flag is
>set automatically in the listen path by the driver (ref:
>https://patchwork.ozlabs.org/project/netdev/patch/20201204170235.84387-4-andraprs@amazon.com/#2594117
>), but from the comments in the referenced patch, I am guessing the
>applications in the guest that will "connect" (as opposed to listen)
>will need to set the flag in the application code? So does the
>VMADDR_FLAG_TO_HOST flag need to be set by the applications in the
>guest that will "connect" or should it work without it? I am asking
>because the nitro-enclave VMs have an "init" which tries to connect to
>CID 3 to send a "hello" on boot to let the parent VM know that it
>booted expecting a "hello" reply but the init doesn't seem to set the
>flag https://github.com/aws/aws-nitro-enclaves-sdk-bootstrap/blob/main/init/init.c#L356C1-L361C7

Looking at af_vsock.c code, it looks like that if we don't have any
H2G transports (e.g. vhost-vsock) loaded in the VM (this is loaded for 
nested VMs, so I guess for nitro-enclave VM this should not be the 
case), the packets are forwarded to the host in any case.

See 
https://elixir.bootlin.com/linux/latest/source/net/vmw_vsock/af_vsock.c#L469

>.
>
>I was following
>https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock#sibling-vm-communication
>to test if sibling communication works and it seems like I didn't need
>to modify the "socat" to set the "VMADDR_FLAG_TO_HOST". I am wondering
>why it works without any modification. Here is what I do:
>
>shell1: ./vhost-device-vsock --vm
>guest-cid=3,uds-path=/tmp/vm3.vsock,socket=/tmp/vhost3.socket --vm
>guest-cid=4,uds-path=/tmp/vm4.vsock,socket=/tmp/vhost4.socket
>
>shell2: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
>-enable-kvm -m 8G -nic user,model=virtio -drive
>file=/home/dorjoy/Forks/test_vm/fedora2.qcow2,media=disk,if=virtio
>--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
>socket,id=char0,reconnect=0,path=/tmp/vhost3.socket -device
>vhost-user-vsock-pci,chardev=char0
>    inside this guest I run: socat - VSOCK-LISTEN:9000
>
>shell3: ./qemu-system-x86_64 -machine q35,memory-backend=mem0
>-enable-kvm -m 8G -nic user,model=virtio -drive
>file=/home/dorjoy/Forks/test_vm/fedora40.qcow2,media=disk,if=virtio
>--display sdl -object memory-backend-memfd,id=mem0,size=8G -chardev
>socket,id=char0,reconnect=0,path=/tmp/vhost4.socket -device
>vhost-user-vsock-pci,chardev=char0
>    inside this guest I run: socat - VSOCK-CONNECT:3:9000
>
>Then when I type something in the socat terminal of one VM and hit
>'enter', they pop up in the socat terminal of the other VM. From the
>documentation of the vhost-device-vsock, I thought I would need to
>patch socat to set the "VMADDR_FLAG_TO_HOST" but I did not do anything
>with socat. I simply did "sudo dnf install socat" in both VMs. I also
>looked into the socat source code and I didn't see any reference to
>"VMADDR_FLAG_TO_HOST". I am running "Fedora 40" on both VMs. Do you
>know why it works without the flag?

Yep, so the driver will forward them if the H2G transport is not loaded, 
like in your case. So if you set VMADDR_FLAG_TO_HOST you are sure that 
it is always forwarded to the host, if you don't set it, it is forwarded 
only if you don't have a nested VM using vhost-vsock. In that case we 
don't know how to differentiate the case of communication with a nested 
guest or a sibling guest, for this reason we added the flag.

If the host uses vhost-vsock, that packets are discarded, but for 
vhost-device-vsock, we are handling them.

Hope this clarify.

Stefano

>
>On Wed, Jun 26, 2024 at 11:43 PM Dorjoy Chowdhury
><dorjoychy111@...il.com> wrote:
>>
>> Hey Stefano,
>> Thanks a lot for all the details. I will look into them and reach out
>> if I need further input. Thanks! I have tried to summarize my
>> understanding below. Let me know if that sounds correct.
>>
>> On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella <sgarzare@...hat.com> wrote:
>> >
>> > Hi Dorjoy,
>> >
>> > On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
>> > >Hey Stefano,
>> >
>> > [...]
>> >
>> > >> >
>> > >> >So the immediate plan would be to:
>> > >> >
>> > >> >  1) Build a new vhost-vsock-forward object model that connects to
>> > >> >vhost as CID 3 and then forwards every packet from CID 1 to the
>> > >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
>> > >>
>> > >> This though requires writing completely from scratch the virtio-vsock
>> > >> emulation in QEMU. If you have time that would be great, otherwise if
>> > >> you want to do a PoC, my advice is to start with vhost-user-vsock which
>> > >> is already there.
>> > >>
>> > >
>> > >Can you give me some more details about how I can implement the
>> > >daemon?
>> >
>> > We already have a demon written in Rust, so I don't recommend you
>> > rewrite one from scratch, just start with that. You can find the daemon
>> > and instructions on how to use it with QEMU here [1].
>> >
>> > >I would appreciate some pointers to code too.
>> >
>> > I sent the pointer to it in my first reply [2].
>> >
>> > >
>> > >Right now, the "nitro-enclave" machine type (wip) in QEMU
>> > >automatically spawns a VHOST_VSOCK device with the CID equal to the
>> > >"guest-cid" machine option. I think this is equivalent to using the
>> > >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
>> > >need any change? I guess instead of "vhost-vsock-device", the
>> > >vhost-vsock device needs to be equivalent to "-device
>> > >vhost-user-vsock-device,guest-cid=N"?
>> >
>> > Nope, the vhost-user-vsock device requires just a `chardev` option.
>> > The chardev points to the Unix socket used by QEMU to talk with the
>> > daemon. The daemon has a parameter to set the CID. See [1] for the
>> > examples.
>> >
>> > >
>> > >The applications inside the nitro-enclave VM will still connect and
>> > >talk to CID 3. So on the daemon side, do we need to spawn a device
>> > >that has CID 3 and then forward everything this device receives to CID
>> > >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
>> > >to the "guest-cid"?
>> >
>> > Yep, I think this is right.
>> > Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback`
>> > kernel module.
>> >
>> > Before modifying the code, if you want to do some testing, perhaps you
>> > can use socat (which supports both UNIX-* and VSOCK-*). The daemon for
>> > now exposes two unix sockets, one is used to communicate with QEMU via
>> > the vhost-user protocol, and the other is to be used by the application
>> > to communicate with vsock sockets in the guest using the hybrid protocol
>> > defined by firecracker. So you could initiate a socat between the latter
>> > and VMADDR_CID_LOCAL, the only problem I see is that you have to send
>> > the first string provided by the hybrid protocol (CONNECT 1234), but for
>> > a PoC it should be ok.
>> >
>> > I just tried the following and it works without touching any code:
>> >
>> > shell1$ ./target/debug/vhost-device-vsock \
>> >      --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock
>> >
>> > shell2$ sudo modprobe vsock_loopback
>> > shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock
>> >
>> > shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
>> >      -drive file=fedora40.qcow2,format=qcow2,if=virtio\
>> >      -chardev socket,id=char0,path=/tmp/vhost3.socket \
>> >      -device vhost-user-vsock-pci,chardev=char0 \
>> >      -object memory-backend-memfd,id=mem,size=512M \
>> >      -nographic
>> >
>> >      guest$ nc --vsock -l 1234
>> >
>> > shell4$ nc --vsock 1 1234
>> > CONNECT 1234
>> >
>> >      Note: the `CONNECT 1234` is required by the hybrid vsock protocol
>> >      defined by firecracker, so if we extend the vhost-device-vsock
>> >      daemon to forward packet to VMADDR_CID_LOCAL, that would not be
>> >      needed (including running socat).
>> >
>>
>> Understood. Just trying to think out loud what the final UX will be
>> from the user perspective to successfully run a nitro VM before I try
>> to modify vhost-device-vsock to support forwarding to
>> VMADDR_CID_LOCAL.
>> I guess because the "vhost-user-vsock" device needs to be spawned
>> implicitly (without any explicit option) inside nitro-enclave in QEMU,
>> we now need to provide the "chardev" as a machine option, so the
>> nitro-enclave command would look something like below:
>> "./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel
>> /path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G
>> --enable-kvm -cpu host"
>> and then set the chardev id to the vhost-user-vsock device in the code
>> from the machine option.
>>
>> The modified "vhost-device-vsock" would need to be run with the new
>> option that will forward everything to VMADDR_CID_LOCAL (below by the
>> "-z" I mean the new option)
>> "./target/debug/vhost-device-vsock -z --vm
>> guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock"
>> this means the guest-cid of the nitro VM is CID 5, right?
>>
>> And the applications in the host would need to use VMADDR_CID_LOCAL
>> for communication instead of "guest-cid" (5) (assuming vsock_loopback
>> is modprobed). Let's say there are 2 applications inside the nitro VM
>> that connect to CID 3 on port 9000 and 9001. And the applications on
>> the host listen on 9000 and 9001 using VMADDR_CID_LOCAL. So, after the
>> commands above (qemu VM and vhost-device-vsock) are run, the
>> communication between the applications in the host and the
>> applications in the nitro VM on port 9000 and 9001 should just work,
>> right, without needing to run any extra socat commands or such? or
>> will the user still need to run some socat commands for all the
>> relevant ports (e.g.,9000 and 9001)?
>>
>> I am just wondering what kind of changes are needed in
>> vhost-device-vsock for forwarding packets to VMADDR_CID_LOCAL? Will
>> that be something like this: the codepath that handles
>> "/tmp/vm5.vsock", upon receiving a "connect" (from inside the nitro
>> VM) for any port to "/tmp/vm5.vsock", vhost-device-vsock will just
>> connect to the same port using AF_VSOCK using the socket system calls
>> and messages received on that port in "/tmp/vm5.vsock" will be "send"
>> to the AF_VSOCK socket? or am I not thinking right and the
>> implementation would be something different entirely (change the CID
>> from 3 to 2 (or 1?) on the packets before they are handled then socat
>> will be needed probably)? Will this work if the applications in the
>> host want to connect to applications inside the nitro VM (as opposed
>> to applications inside the nitro VM connecting to CID 3)?
>>
>> Thanks and Regards,
>> Dorjoy
>