lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z+yDCKt7GpubbTKJ@devvm6277.cco0.facebook.com>
Date: Tue, 1 Apr 2025 17:21:28 -0700
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: Daniel P. Berrangé <berrange@...hat.com>
Cc: Stefano Garzarella <sgarzare@...hat.com>,
	Jakub Kicinski <kuba@...nel.org>,
	"K. Y. Srinivasan" <kys@...rosoft.com>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
	Stefan Hajnoczi <stefanha@...hat.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Jason Wang <jasowang@...hat.com>,
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
	Eugenio Pérez <eperezma@...hat.com>,
	Bryan Tan <bryan-bt.tan@...adcom.com>,
	Vishnu Dasa <vishnu.dasa@...adcom.com>,
	Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>,
	"David S. Miller" <davem@...emloft.net>,
	virtualization@...ts.linux.dev, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-hyperv@...r.kernel.org,
	kvm@...r.kernel.org
Subject: Re: [PATCH v2 0/3] vsock: add namespace support to vhost-vsock

On Tue, Apr 01, 2025 at 08:05:16PM +0100, Daniel P. Berrangé wrote:
> On Fri, Mar 28, 2025 at 06:03:19PM +0100, Stefano Garzarella wrote:
> > CCing Daniel
> > 
> > On Wed, Mar 12, 2025 at 01:59:34PM -0700, Bobby Eshleman wrote:
> > > Picking up Stefano's v1 [1], this series adds netns support to
> > > vhost-vsock. Unlike v1, this series does not address guest-to-host (g2h)
> > > namespaces, defering that for future implementation and discussion.
> > > 
> > > Any vsock created with /dev/vhost-vsock is a global vsock, accessible
> > > from any namespace. Any vsock created with /dev/vhost-vsock-netns is a
> > > "scoped" vsock, accessible only to sockets in its namespace. If a global
> > > vsock or scoped vsock share the same CID, the scoped vsock takes
> > > precedence.
> > > 
> > > If a socket in a namespace connects with a global vsock, the CID becomes
> > > unavailable to any VMM in that namespace when creating new vsocks. If
> > > disconnected, the CID becomes available again.
> > 
> > I was talking about this feature with Daniel and he pointed out something
> > interesting (Daniel please feel free to correct me):
> > 
> >     If we have a process in the host that does a listen(AF_VSOCK) in a
> > namespace, can this receive connections from guests connected to
> > /dev/vhost-vsock in any namespace?
> > 
> >     Should we provide something (e.g. sysctl/sysfs entry) to disable
> > this behaviour, preventing a process in a namespace from receiving
> > connections from the global vsock address space (i.e.      /dev/vhost-vsock
> > VMs)?
> 
> I think my concern goes a bit beyond that, to the general conceptual
> idea of sharing the CID space between the global vsocks and namespace
> vsocks. So I'm not sure a sysctl would be sufficient...details later
> below..
> 
> > I understand that by default maybe we should allow this behaviour in order
> > to not break current applications, but in some cases the user may want to
> > isolate sockets in a namespace also from being accessed by VMs running in
> > the global vsock address space.
> > 
> > Indeed in this series we have talked mostly about the host -> guest path (as
> > the direction of the connection), but little about the guest -> host path,
> > maybe we should explain it better in the cover/commit
> > descriptions/documentation.
> 
> > > Testing
> > > 
> > > QEMU with /dev/vhost-vsock-netns support:
> > > 	https://github.com/beshleman/qemu/tree/vsock-netns
> > > 
> > > Test: Scoped vsocks isolated by namespace
> > > 
> > >  host# ip netns add ns1
> > >  host# ip netns add ns2
> > >  host# ip netns exec ns1 \
> > > 				  qemu-system-x86_64 \
> > > 					  -m 8G -smp 4 -cpu host -enable-kvm \
> > > 					  -serial mon:stdio \
> > > 					  -drive if=virtio,file=${IMAGE1} \
> > > 					  -device vhost-vsock-pci,netns=on,guest-cid=15
> > >  host# ip netns exec ns2 \
> > > 				  qemu-system-x86_64 \
> > > 					  -m 8G -smp 4 -cpu host -enable-kvm \
> > > 					  -serial mon:stdio \
> > > 					  -drive if=virtio,file=${IMAGE2} \
> > > 					  -device vhost-vsock-pci,netns=on,guest-cid=15
> > > 
> > >  host# socat - VSOCK-CONNECT:15:1234
> > >  2025/03/10 17:09:40 socat[255741] E connect(5, AF=40 cid:15 port:1234, 16): No such device
> > > 
> > >  host# echo foobar1 | sudo ip netns exec ns1 socat - VSOCK-CONNECT:15:1234
> > >  host# echo foobar2 | sudo ip netns exec ns2 socat - VSOCK-CONNECT:15:1234
> > > 
> > >  vm1# socat - VSOCK-LISTEN:1234
> > >  foobar1
> > >  vm2# socat - VSOCK-LISTEN:1234
> > >  foobar2
> > > 
> > > Test: Global vsocks accessible to any namespace
> > > 
> > >  host# qemu-system-x86_64 \
> > > 	  -m 8G -smp 4 -cpu host -enable-kvm \
> > > 	  -serial mon:stdio \
> > > 	  -drive if=virtio,file=${IMAGE2} \
> > > 	  -device vhost-vsock-pci,guest-cid=15,netns=off
> > > 
> > >  host# echo foobar | sudo ip netns exec ns1 socat - VSOCK-CONNECT:15:1234
> > > 
> > >  vm# socat - VSOCK-LISTEN:1234
> > >  foobar
> > > 
> > > Test: Connecting to global vsock makes CID unavailble to namespace
> > > 
> > >  host# qemu-system-x86_64 \
> > > 	  -m 8G -smp 4 -cpu host -enable-kvm \
> > > 	  -serial mon:stdio \
> > > 	  -drive if=virtio,file=${IMAGE2} \
> > > 	  -device vhost-vsock-pci,guest-cid=15,netns=off
> > > 
> > >  vm# socat - VSOCK-LISTEN:1234
> > > 
> > >  host# sudo ip netns exec ns1 socat - VSOCK-CONNECT:15:1234
> > >  host# ip netns exec ns1 \
> > > 				  qemu-system-x86_64 \
> > > 					  -m 8G -smp 4 -cpu host -enable-kvm \
> > > 					  -serial mon:stdio \
> > > 					  -drive if=virtio,file=${IMAGE1} \
> > > 					  -device vhost-vsock-pci,netns=on,guest-cid=15
> > > 
> > >  qemu-system-x86_64: -device vhost-vsock-pci,netns=on,guest-cid=15: vhost-vsock: unable to set guest cid: Address already in use
> 
> I find it conceptually quite unsettling that the VSOCK CID address
> space for AF_VSOCK is shared between the host and the namespace.
> That feels contrary to how namespaces are more commonly used for
> deterministically isolating resources between the namespace and the
> host.
> 
> Naively I would expect that in a namespace, all VSOCK CIDs are
> free for use, without having to concern yourself with what CIDs
> are in use in the host now, or in future.
> 

True, that would be ideal. I think the definition of backwards
compatibility we've established includes the notion that any VM may
reach any namespace and any namespace may reach any VM. IIUC, it sounds
like you are suggesting this be revised to more strictly adhere to
namespace semantics?

I do like Stefano's suggestion to add a sysctl for a "strict" mode,
Since it offers the best of both worlds, and still tends conservative in
protecting existing applications... but I agree, the non-strict mode
vsock would be unique WRT the usual concept of namespaces.

> What happens if we reverse the QEMU order above, to get the
> following scenario
> 
>    # Launch VM1 inside the NS
>    host# ip netns exec ns1 \
>   				  qemu-system-x86_64 \
>   					  -m 8G -smp 4 -cpu host -enable-kvm \
>   					  -serial mon:stdio \
>   					  -drive if=virtio,file=${IMAGE1} \
>   					  -device vhost-vsock-pci,netns=on,guest-cid=15
>    # Launch VM2
>    host# qemu-system-x86_64 \
>   	  -m 8G -smp 4 -cpu host -enable-kvm \
>   	  -serial mon:stdio \
>   	  -drive if=virtio,file=${IMAGE2} \
>   	  -device vhost-vsock-pci,guest-cid=15,netns=off
>   
>    vm1# socat - VSOCK-LISTEN:1234
>    vm2# socat - VSOCK-LISTEN:1234
> 
>    host# socat - VSOCK-CONNECT:15:1234
>      => Presume this connects to "VM2" running outside the NS
> 
>    host# sudo ip netns exec ns1 socat - VSOCK-CONNECT:15:1234
> 
>      => Does this connect to "VM1" inside the NS, or "VM2"
>         outside the NS ?
> 

VM1 inside the NS. Current logic says that whenever two CIDs collide
(local vs global), always select the one in the local namespace
(irrespective of creation order).

Adding a sysctl option... it would *never* connect to the global one,
even if there was no local match but there was a global one.

> 
> 
> With regards,
> Daniel

Thanks for the review!

Best,
Bobby

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ