[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aSh9Qdsxl1Ot8byP@devvm11784.nha0.facebook.com>
Date: Thu, 27 Nov 2025 08:33:05 -0800
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>,
Stefan Hajnoczi <stefanha@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>,
Eugenio Pérez <eperezma@...hat.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
"K. Y. Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
Bryan Tan <bryan-bt.tan@...adcom.com>,
Vishnu Dasa <vishnu.dasa@...adcom.com>,
Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>,
Shuah Khan <shuah@...nel.org>, linux-kernel@...r.kernel.org,
virtualization@...ts.linux.dev, netdev@...r.kernel.org,
kvm@...r.kernel.org, linux-hyperv@...r.kernel.org,
linux-kselftest@...r.kernel.org, berrange@...hat.com,
Sargun Dhillon <sargun@...gun.me>,
Bobby Eshleman <bobbyeshleman@...a.com>
Subject: Re: [PATCH net-next v12 02/12] vsock: add netns to vsock core
On Thu, Nov 27, 2025 at 03:25:32PM +0100, Stefano Garzarella wrote:
> On Wed, Nov 26, 2025 at 11:47:31PM -0800, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@...a.com>
> >
> > Add netns logic to vsock core. Additionally, modify transport hook
> > prototypes to be used by later transport-specific patches (e.g.,
> > *_seqpacket_allow()).
> >
> > Namespaces are supported primarily by changing socket lookup functions
> > (e.g., vsock_find_connected_socket()) to take into account the socket
> > namespace and the namespace mode before considering a candidate socket a
> > "match".
> >
> > This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
> > accepts the "global" or "local" mode strings.
> >
> > Add netns functionality (initialization, passing to transports, procfs,
> > etc...) to the af_vsock socket layer. Later patches that add netns
> > support to transports depend on this patch.
> >
> > dgram_allow(), stream_allow(), and seqpacket_allow() callbacks are
> > modified to take a vsk in order to perform logic on namespace modes. In
> > future patches, the net and net_mode will also be used for socket
> > lookups in these functions.
> >
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@...a.com>
> > ---
> > Changes in v12:
> > - return true in dgram_allow(), stream_allow(), and seqpacket_allow()
> > only if net_mode == VSOCK_NET_MODE_GLOBAL (Stefano)
> > - document bind(VMADDR_CID_ANY) case in af_vsock.c (Stefano)
> > - change order of stream_allow() call in vmci so we can pass vsk
> > to it
> >
> > Changes in v10:
> > - add file-level comment about what happens to sockets/devices
> > when the namespace mode changes (Stefano)
> > - change the 'if (write)' boolean in vsock_net_mode_string() to
> > if (!write), this simplifies a later patch which adds "goto"
> > for mutex unlocking on function exit.
> >
> > Changes in v9:
> > - remove virtio_vsock_alloc_rx_skb() (Stefano)
> > - remove vsock_global_dummy_net, not needed as net=NULL +
> > net_mode=VSOCK_NET_MODE_GLOBAL achieves identical result
> >
> > Changes in v7:
> > - hv_sock: fix hyperv build error
> > - explain why vhost does not use the dummy
> > - explain usage of __vsock_global_dummy_net
> > - explain why VSOCK_NET_MODE_STR_MAX is 8 characters
> > - use switch-case in vsock_net_mode_string()
> > - avoid changing transports as much as possible
> > - add vsock_find_{bound,connected}_socket_net()
> > - rename `vsock_hdr` to `sysctl_hdr`
> > - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
> > global mode for virtio-vsock, move skb->cb zero-ing into wrapper
> > - explain seqpacket_allow() change
> > - move net setting to __vsock_create() instead of vsock_create() so
> > that child sockets also have their net assigned upon accept()
> >
> > Changes in v6:
> > - unregister sysctl ops in vsock_exit()
> > - af_vsock: clarify description of CID behavior
> > - af_vsock: fix buf vs buffer naming, and length checking
> > - af_vsock: fix length checking w/ correct ctl_table->maxlen
> >
> > Changes in v5:
> > - vsock_global_net() -> vsock_global_dummy_net()
> > - update comments for new uAPI
> > - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
> > - add prototype changes so patch remains compilable
> > ---
> > drivers/vhost/vsock.c | 9 +-
> > include/linux/virtio_vsock.h | 4 +-
> > include/net/af_vsock.h | 13 +-
> > net/vmw_vsock/af_vsock.c | 272 +++++++++++++++++++++++++++++---
> > net/vmw_vsock/hyperv_transport.c | 7 +-
> > net/vmw_vsock/virtio_transport.c | 9 +-
> > net/vmw_vsock/virtio_transport_common.c | 6 +-
> > net/vmw_vsock/vmci_transport.c | 26 ++-
> > net/vmw_vsock/vsock_loopback.c | 8 +-
> > 9 files changed, 310 insertions(+), 44 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index ae01457ea2cd..83937e1d63fa 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -404,7 +404,8 @@ static bool vhost_transport_msgzerocopy_allow(void)
> > return true;
> > }
> >
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
> > + u32 remote_cid);
> >
> > static struct virtio_transport vhost_transport = {
> > .transport = {
> > @@ -460,11 +461,15 @@ static struct virtio_transport vhost_transport = {
> > .send_pkt = vhost_transport_send_pkt,
> > };
> >
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk,
> > + u32 remote_cid)
> > {
> > struct vhost_vsock *vsock;
> > bool seqpacket_allow = false;
> >
> > + if (vsk->net_mode != VSOCK_NET_MODE_GLOBAL)
> > + return false;
> > +
> > rcu_read_lock();
> > vsock = vhost_vsock_get(remote_cid);
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 0c67543a45c8..1845e8d4f78d 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -256,10 +256,10 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> >
> > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > -bool virtio_transport_stream_allow(u32 cid, u32 port);
> > +bool virtio_transport_stream_allow(struct vsock_sock *vsk, u32 cid, u32 port);
> > int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > struct sockaddr_vm *addr);
> > -bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > +bool virtio_transport_dgram_allow(struct vsock_sock *vsk, u32 cid, u32 port);
> >
> > int virtio_transport_connect(struct vsock_sock *vsk);
> >
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 9b5bdd083b6f..d10e73cd7413 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -126,7 +126,7 @@ struct vsock_transport {
> > size_t len, int flags);
> > int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *,
> > struct msghdr *, size_t len);
> > - bool (*dgram_allow)(u32 cid, u32 port);
> > + bool (*dgram_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
> >
> > /* STREAM. */
> > /* TODO: stream_bind() */
> > @@ -138,14 +138,14 @@ struct vsock_transport {
> > s64 (*stream_has_space)(struct vsock_sock *);
> > u64 (*stream_rcvhiwat)(struct vsock_sock *);
> > bool (*stream_is_active)(struct vsock_sock *);
> > - bool (*stream_allow)(u32 cid, u32 port);
> > + bool (*stream_allow)(struct vsock_sock *vsk, u32 cid, u32 port);
> >
> > /* SEQ_PACKET. */
> > ssize_t (*seqpacket_dequeue)(struct vsock_sock *vsk, struct msghdr *msg,
> > int flags);
> > int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg,
> > size_t len);
> > - bool (*seqpacket_allow)(u32 remote_cid);
> > + bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid);
> > u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
> >
> > /* Notification. */
> > @@ -218,6 +218,13 @@ void vsock_remove_connected(struct vsock_sock *vsk);
> > struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
> > struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > struct sockaddr_vm *dst);
> > +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > + struct net *net,
> > + enum vsock_net_mode net_mode);
> > +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > + struct sockaddr_vm *dst,
> > + struct net *net,
> > + enum vsock_net_mode net_mode);
> > void vsock_remove_sock(struct vsock_sock *vsk);
> > void vsock_for_each_connected_socket(struct vsock_transport *transport,
> > void (*fn)(struct sock *sk));
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index adcba1b7bf74..6113c22db8dc 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -83,6 +83,46 @@
> > * TCP_ESTABLISHED - connected
> > * TCP_CLOSING - disconnecting
> > * TCP_LISTEN - listening
> > + *
> > + * - Namespaces in vsock support two different modes configured
> > + * through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
> > + * Each mode defines how the namespace interacts with CIDs.
> > + * /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
> > + * and locked down by a namespace manager. The default is "global". The mode
> > + * is set per-namespace.
> > + *
> > + * The modes affect the allocation and accessibility of CIDs as follows:
> > + *
> > + * - global - access and allocation are all system-wide
>
> nit: maybe we should mention that this mode is primarily for backward
> compatibility, since it's the way how vsock worked before netns support.
>
> (We can fix later eventually with a followup patch)
>
> > + * - all CID allocation from global namespaces draw from the same
> > + * system-wide pool.
> > + * - if one global namespace has already allocated some CID, another
> > + * global namespace will not be able to allocate the same CID.
> > + * - global mode AF_VSOCK sockets can reach any VM or socket in any global
> > + * namespace, they are not contained to only their own namespace.
> > + * - AF_VSOCK sockets in a global mode namespace cannot reach VMs or
> > + * sockets in any local mode namespace.
> > + * - local - access and allocation are contained within the namespace
> > + * - CID allocation draws only from a private pool local only to the
> > + * namespace, and does not affect the CIDs available for allocation in any
> > + * other namespace (global or local).
> > + * - VMs in a local namespace do not collide with CIDs in any other local
> > + * namespace or any global namespace. For example, if a VM in a local mode
> > + * namespace is given CID 10, then CID 10 is still available for
> > + * allocation in any other namespace, but not in the same namespace.
> > + * - AF_VSOCK sockets in a local mode namespace can connect only to VMs or
> > + * other sockets within their own namespace.
> > + * - sockets bound to VMADDR_CID_ANY in local namespaces will never resolve
> > + * to any transport that is not compatible with local mode. There is no
> > + * error that propagates to the user (as there is for connection attempts)
> > + * because it is possible for some packet to reach this socket from
> > + * a different transport that *does* support local mode. For
> > + * example, virtio-vsock may not support local mode, but the socket
> > + * may still accept a connection from vhost-vsock which does.
> > + *
> > + * - when a socket or device is initialized in a namespace with mode
> > + * global, it will stay in global mode even if the namespace later
> > + * changes to local.
> > */
> >
> > #include <linux/compat.h>
> > @@ -100,6 +140,7 @@
> > #include <linux/module.h>
> > #include <linux/mutex.h>
> > #include <linux/net.h>
> > +#include <linux/proc_fs.h>
> > #include <linux/poll.h>
> > #include <linux/random.h>
> > #include <linux/skbuff.h>
> > @@ -111,9 +152,18 @@
> > #include <linux/workqueue.h>
> > #include <net/sock.h>
> > #include <net/af_vsock.h>
> > +#include <net/netns/vsock.h>
> > #include <uapi/linux/vm_sockets.h>
> > #include <uapi/asm-generic/ioctls.h>
> >
> > +#define VSOCK_NET_MODE_STR_GLOBAL "global"
> > +#define VSOCK_NET_MODE_STR_LOCAL "local"
> > +
> > +/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'.
> > + * The newline is added by proc_dostring() for read operations.
> > + */
> > +#define VSOCK_NET_MODE_STR_MAX 8
> > +
> > static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
> > static void vsock_sk_destruct(struct sock *sk);
> > static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
> > @@ -235,33 +285,47 @@ static void __vsock_remove_connected(struct vsock_sock *vsk)
> > sock_put(&vsk->sk);
> > }
> >
> > -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > + struct net *net,
> > + enum vsock_net_mode net_mode)
> > {
> > struct vsock_sock *vsk;
> >
> > list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
> > - if (vsock_addr_equals_addr(addr, &vsk->local_addr))
> > - return sk_vsock(vsk);
> > + struct sock *sk = sk_vsock(vsk);
> > +
> > + if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&
> > + vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
> > + net_mode))
> > + return sk;
> >
> > if (addr->svm_port == vsk->local_addr.svm_port &&
> > (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
> > - addr->svm_cid == VMADDR_CID_ANY))
> > - return sk_vsock(vsk);
> > + addr->svm_cid == VMADDR_CID_ANY) &&
> > + vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
> > + net_mode))
> > + return sk;
> > }
> >
> > return NULL;
> > }
> >
> > -static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
> > - struct sockaddr_vm *dst)
> > +static struct sock *
> > +__vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > + struct sockaddr_vm *dst, struct net *net,
> > + enum vsock_net_mode net_mode)
> > {
> > struct vsock_sock *vsk;
> >
> > list_for_each_entry(vsk, vsock_connected_s)ckets(src, dst),
> > connected_table) {
> > + struct sock *sk = sk_vsock(vsk);
> > +
> > if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
> > - dst->svm_port == vsk->local_addr.svm_port) {
> > - return sk_vsock(vsk);
> > + dst->svm_port == vsk->local_addr.svm_port &&
> > + vsock_net_check_mode(sock_net(sk), vsk->net_mode, net,
> > + net_mode)) {
> > + return sk;
> > }
> > }
> >
> > @@ -304,12 +368,14 @@ void vsock_remove_connected(struct vsock_sock *vsk)
> > }
> > EXPORT_SYMBOL_GPL(vsock_remove_connected);
> >
> > -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
> > + struct net *net,
> > + enum vsock_net_mode net_mode)
> > {
> > struct sock *sk;
> >
> > spin_lock_bh(&vsock_table_lock);
> > - sk = __vsock_find_bound_socket(addr);
> > + sk = __vsock_find_bound_socket_net(addr, net, net_mode);
> > if (sk)
> > sock_hold(sk);
> >
> > @@ -317,15 +383,23 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> >
> > return sk;
> > }
> > +EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net);
> > +
> > +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
> > +{
> > + return vsock_find_bound_socket_net(addr, NULL, VSOCK_NET_MODE_GLOBAL);
>
> The patch LGTM, my last doubt now is if here (and in
> vsock_find_connected_socket() ) we should use `init_net`.
>
> In practice, this is the namespace (NULL) and mode (GLOBAL) used by
> transports that do not support namespaces.
>
> So here we are making them belong to no namespace, so they can only reach
> global ones. When any namespace, including `init_net`, switches to local, it
> can no longer be reached by transports that do not support local namespaces,
> because in practice we still do not have a way to associate a device (in the
> case of drivers) with a specific namespace. Right?
Right.
>
> If I get it right, it can makes sense, but I'd like an ack from net
> maintainers to be sure we are doing the right things.
>
> Also I think we should have a comment on top of this function to make it
> clear that should be used only by transport that doesn't support namespace,
> and the reason why we used NULL and GLOBAL. Plus a comment on top of this
> file (near where we described local vs global) to clarify the status of
> this.
>
> That said, if next week net-next will close, I think we can send a follow-up
> patch just for those comments, so:
Sounds good, I'll wait for further feedback before sending anything!
>
> Reviewed-by: Stefano Garzarella <sgarzare@...hat.com>
>
> > +}
> > EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
> >
> > -struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > - struct sockaddr_vm *dst)
> > +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
> > + struct sockaddr_vm *dst,
> > + struct net *net,
> > + enum vsock_net_mode net_mode)
> > {
> > struct sock *sk;
> >
> > spin_lock_bh(&vsock_table_lock);
> > - sk = __vsock_find_connected_socket(src, dst);
> > + sk = __vsock_find_connected_socket_net(src, dst, net, net_mode);
> > if (sk)
> > sock_hold(sk);
> >
> > @@ -333,6 +407,14 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> >
> > return sk;
> > }
> > +EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net);
> > +
> > +struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
> > + struct sockaddr_vm *dst)
> > +{
> > + return vsock_find_connected_socket_net(src, dst,
> > + NULL, VSOCK_NET_MODE_GLOBAL);
> > +}
> > EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
>
Powered by blists - more mailing lists