[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z9MFvkALRY/k3ITG@devvm6277.cco0.facebook.com>
Date: Thu, 13 Mar 2025 09:20:14 -0700
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
"K. Y. Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Eugenio PĂ©rez <eperezma@...hat.com>,
Bryan Tan <bryan-bt.tan@...adcom.com>,
Vishnu Dasa <vishnu.dasa@...adcom.com>,
Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>,
"David S. Miller" <davem@...emloft.net>,
virtualization@...ts.linux.dev, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-hyperv@...r.kernel.org,
kvm@...r.kernel.org
Subject: Re: [PATCH v2 0/3] vsock: add namespace support to vhost-vsock
On Thu, Mar 13, 2025 at 04:37:16PM +0100, Stefano Garzarella wrote:
> Hi Bobby,
> first of all, thank you for starting this work again!
>
You're welcome, thank you for your work getting it started!
> On Wed, Mar 12, 2025 at 07:28:33PM -0700, Bobby Eshleman wrote:
> > Hey all,
> >
> > Apologies for forgetting the 'net-next' prefix on this one. Should I
> > resend or no?
>
> I'd say let's do a firts review cycle on this, then you can re-post.
> Please check also maintainer cced, it looks like someone is missing:
> https://patchwork.kernel.org/project/netdevbpf/patch/20250312-vsock-netns-v2-1-84bffa1aa97a@gmail.com/
>
Duly noted, I'll double-check the ccs next time. sgtm on the re-post!
> > On Wed, Mar 12, 2025 at 01:59:34PM -0700, Bobby Eshleman wrote:
> > > Picking up Stefano's v1 [1], this series adds netns support to
> > > vhost-vsock. Unlike v1, this series does not address guest-to-host (g2h)
> > > namespaces, defering that for future implementation and discussion.
> > >
> > > Any vsock created with /dev/vhost-vsock is a global vsock, accessible
> > > from any namespace. Any vsock created with /dev/vhost-vsock-netns is a
> > > "scoped" vsock, accessible only to sockets in its namespace. If a global
> > > vsock or scoped vsock share the same CID, the scoped vsock takes
> > > precedence.
>
> This inside the netns, right?
> I mean if we are in a netns, and there is a VM A attached to
> /dev/vhost-vsock-netns witch CID=42 and a VM B attached to /dev/vhost-vsock
> also with CID=42, this means that VM A will not be accessible in the netns,
> but it can be accessible outside of the netns,
> right?
>
In this scenario, CID=42 goes to VM A (/dev/vhost-vsock-netns) for any
socket in its namespace. For any other namespace, CID=42 will go to VM
B (/dev/vhost-vsock).
If I understand your setup correctly:
Namespace 1:
VM A - /dev/vhost-vsock-netns, CID=42
Process X
Namespace 2:
VM B - /dev/vhost-vsock, CID=42
Process Y
Namespace 3:
Process Z
In this scenario, taking connect() as an example:
Process X connect(CID=42) goes to VM A
Process Y connect(CID=42) goes to VM B
Process Z connect(CID=42) goes to VM B
If VM A goes away (migration, shutdown, etc...):
Process X connect(CID=42) also goes to VM B
> > >
> > > If a socket in a namespace connects with a global vsock, the CID becomes
> > > unavailable to any VMM in that namespace when creating new vsocks. If
> > > disconnected, the CID becomes available again.
>
> IIUC if an application in the host running in a netns, is connected to a
> guest attached to /dev/vhost-vsock (e.g. CID=42), a new guest can't be ask
> for the same CID (42) on /dev/vhost-vsock-netns in the same netns till that
> connection is active. Is that right?
>
Right. Here is the scenario I am trying to avoid:
Step 1: namespace 1, VM A allocated with CID 42 on /dev/vhost-vsock
Step 2: namespace 2, connect(CID=42) (this is legal, preserves old
behavior)
Step 3: namespace 2, VM B allocated with CID 42 on
/dev/vhost-vsock-netns
After step 3, CID=42 in this current namespace should belong to VM B, but
the connection from step 2 would be with VM A.
I think we have some options:
1. disallow the new VM B because the namespace is already active with VM A
2. try and allow the connection to resume, but just make sure that new
connections got o VM B
3. close the connection from namespace 2, spin up VM B, hope user
manages connection retry
4. auto-retry connect to the new VM B? (seems like doing too much on the
kernel side to me)
I chose option 1 for this rev mostly for the simplicity but definitely
open to suggestions. I think option 3 is also a simple implementation.
Option 2 would require adding some concept of "vhost-vsock ns at time of
connection" to each socket, so the tranport would know which vhost_vsock
to use for which socket.
> > >
> > > Testing
> > >
> > > QEMU with /dev/vhost-vsock-netns support:
> > > https://github.com/beshleman/qemu/tree/vsock-netns
>
> You can also use unmodified QEMU using `vhostfd` parameter of
> `vhost-vsock-pci` device:
>
> # FD will contain the file descriptor to /dev/vhost-vsock-netns
> exec {FD}<>/dev/vhost-vsock-netns
>
> # pass FD to the device, this is used for example by libvirt
> qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
> -drive file=fedora.qcow2,format=qcow2,if=virtio \
> -object memory-backend-memfd,id=mem,size=512M \
> -device vhost-vsock-pci,vhostfd=${FD},guest-cid=42 -nographic
>
Very nice, thanks, I didn't realize that!
> That said, I agree we can extend QEMU with `netns` param too.
>
I'm open to either. Your solution above is super elegant.
> BTW, I'm traveling, I'll be back next Tuesday and I hope to take a deeper
> look to the patches.
>
> Thanks,
> Stefano
>
Thanks Stefano! Enjoy the travel.
Best,
Bobby
Powered by blists - more mailing lists