[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <767jcion4jrguxsbshfap6dgncuhlgts2a5ybka5vdyos4x57d@ezkx72irws2h>
Date: Fri, 7 Nov 2025 14:53:28 +0100
From: Stefano Garzarella <sgarzare@...hat.com>
To: Bobby Eshleman <bobbyeshleman@...il.com>
Cc: Shuah Khan <shuah@...nel.org>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Stefan Hajnoczi <stefanha@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Eugenio Pérez <eperezma@...hat.com>, "K. Y. Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
Bryan Tan <bryan-bt.tan@...adcom.com>, Vishnu Dasa <vishnu.dasa@...adcom.com>,
Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>, virtualization@...ts.linux.dev, netdev@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
linux-hyperv@...r.kernel.org, berrange@...hat.com, Bobby Eshleman <bobbyeshleman@...a.com>
Subject: Re: [PATCH net-next v8 04/14] vsock: add netns to vsock core
On Thu, Nov 06, 2025 at 06:03:10PM -0800, Bobby Eshleman wrote:
>On Thu, Nov 06, 2025 at 05:18:00PM +0100, Stefano Garzarella wrote:
>> On Thu, Oct 23, 2025 at 11:27:43AM -0700, Bobby Eshleman wrote:
>> > From: Bobby Eshleman <bobbyeshleman@...a.com>
>> >
>> > Add netns logic to vsock core. Additionally, modify transport hook
>> > prototypes to be used by later transport-specific patches (e.g.,
>> > *_seqpacket_allow()).
>> >
>> > Namespaces are supported primarily by changing socket lookup functions
>> > (e.g., vsock_find_connected_socket()) to take into account the socket
>> > namespace and the namespace mode before considering a candidate socket a
>> > "match".
>> >
>> > Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
>> > used by transports that do not support namespacing. This dummy always
>> > has mode "global" to preserve previous CID behavior.
>> >
>> > This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
>> > accepts the "global" or "local" mode strings.
>> >
>> > The transports (besides vhost) are modified to use the global dummy,
>> > which makes them behave as if always in the global namespace. Vhost is
>> > an exception because it inherits its namespace from the process that
>> > opens the vhost device.
>> >
>> > Add netns functionality (initialization, passing to transports, procfs,
>> > etc...) to the af_vsock socket layer. Later patches that add netns
>> > support to transports depend on this patch.
>> >
>> > seqpacket_allow() callbacks are modified to take a vsk so that transport
>> > implementations can inspect sock_net(sk) and vsk->net_mode when performing
>> > lookups (e.g., vhost does this in its future netns patch). Because the
>> > API change affects all transports, it seemed more appropriate to make
>> > this internal API change in the "vsock core" patch then in the "vhost"
>> > patch.
>> >
>> > Signed-off-by: Bobby Eshleman <bobbyeshleman@...a.com>
>> > ---
>> > Changes in v7:
>> > - hv_sock: fix hyperv build error
>> > - explain why vhost does not use the dummy
>> > - explain usage of __vsock_global_dummy_net
>> > - explain why VSOCK_NET_MODE_STR_MAX is 8 characters
>> > - use switch-case in vsock_net_mode_string()
>> > - avoid changing transports as much as possible
>> > - add vsock_find_{bound,connected}_socket_net()
>> > - rename `vsock_hdr` to `sysctl_hdr`
>> > - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
>> > global mode for virtio-vsock, move skb->cb zero-ing into wrapper
>> > - explain seqpacket_allow() change
>> > - move net setting to __vsock_create() instead of vsock_create() so
>> > that child sockets also have their net assigned upon accept()
>> >
>> > Changes in v6:
>> > - unregister sysctl ops in vsock_exit()
>> > - af_vsock: clarify description of CID behavior
>> > - af_vsock: fix buf vs buffer naming, and length checking
>> > - af_vsock: fix length checking w/ correct ctl_table->maxlen
>> >
>> > Changes in v5:
>> > - vsock_global_net() -> vsock_global_dummy_net()
>> > - update comments for new uAPI
>> > - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
>> > - add prototype changes so patch remains compilable
>> > ---
>> > drivers/vhost/vsock.c | 4 +-
>> > include/linux/virtio_vsock.h | 21 ++++
>> > include/net/af_vsock.h | 14 ++-
>> > net/vmw_vsock/af_vsock.c | 264 ++++++++++++++++++++++++++++++++++++---
>> > net/vmw_vsock/virtio_transport.c | 7 +-
>> > net/vmw_vsock/vsock_loopback.c | 4 +-
>> > 6 files changed, 288 insertions(+), 26 deletions(-)
>> >
>> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > index ae01457ea2cd..34adf0cf9124 100644
>> > --- a/drivers/vhost/vsock.c
>> > +++ b/drivers/vhost/vsock.c
>> > @@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
>> > return true;
>> > }
>> >
>> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid);
>> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
>> >
>> > static struct virtio_transport vhost_transport = {
>> > .transport = {
>> > @@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
>> > .send_pkt = vhost_transport_send_pkt,
>> > };
>> >
>> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid)
>> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
>> > {
>> > struct vhost_vsock *vsock;
>> > bool seqpacket_allow = false;
>> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > index 7f334a32133c..29290395054c 100644
>> > --- a/include/linux/virtio_vsock.h
>> > +++ b/include/linux/virtio_vsock.h
>> > @@ -153,6 +153,27 @@ static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb,
>> > VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
>> > }
>> >
>> > +static inline struct sk_buff *
>> > +virtio_vsock_alloc_rx_skb(unsigned int size, gfp_t mask)
>> > +{
>> > + struct sk_buff *skb;
>> > +
>> > + skb = virtio_vsock_alloc_linear_skb(size, mask);
>> > + if (!skb)
>> > + return NULL;
>> > +
>> > + memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
>> > +
>> > + /* virtio-vsock does not yet support namespaces, so on receive
>> > + * we force legacy namespace behavior using the global dummy net
>> > + * and global net mode.
>> > + */
>> > + virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
>> > + virtio_vsock_skb_set_net_mode(skb, VSOCK_NET_MODE_GLOBAL);
>> > +
>> > + return skb;
>> > +}
>>
>> Why we are introducing this change in this patch?
>>
>> Where the net of the virtio's skb is read?
>>
>
>Oh good point, this is a weird place for this. I'll move this to where
>it is actually used.
>
>[...]
>
>> >
>> > +static int vsock_net_mode_string(const struct ctl_table *table, int write,
>> > + void *buffer, size_t *lenp, loff_t *ppos)
>> > +{
>> > + char data[VSOCK_NET_MODE_STR_MAX] = {0};
>> > + enum vsock_net_mode mode;
>> > + struct ctl_table tmp;
>> > + struct net *net;
>> > + int ret;
>> > +
>> > + if (!table->data || !table->maxlen || !*lenp) {
>> > + *lenp = 0;
>> > + return 0;
>> > + }
>> > +
>> > + net = current->nsproxy->net_ns;
>> > + tmp = *table;
>> > + tmp.data = data;
>> > +
>> > + if (!write) {
>> > + const char *p;
>> > +
>> > + mode = vsock_net_mode(net);
>> > +
>> > + switch (mode) {
>> > + case VSOCK_NET_MODE_GLOBAL:
>> > + p = VSOCK_NET_MODE_STR_GLOBAL;
>> > + break;
>> > + case VSOCK_NET_MODE_LOCAL:
>> > + p = VSOCK_NET_MODE_STR_LOCAL;
>> > + break;
>> > + default:
>> > + WARN_ONCE(true, "netns has invalid vsock mode");
>> > + *lenp = 0;
>> > + return 0;
>> > + }
>> > +
>> > + strscpy(data, p, sizeof(data));
>> > + tmp.maxlen = strlen(p);
>> > + }
>> > +
>> > + ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
>> > + if (ret)
>> > + return ret;
>> > +
>> > + if (write) {
>>
>> Do we need to check some capability, e.g. CAP_NET_ADMIN ?
>>
>
>We get that for free via the sysctl_net registration, through this path
>on open (CAP_NET_ADMIN is checked in net_ctl_permissions):
>
> net_ctl_permissions+1
> sysctl_perm+24
> proc_sys_permission+117
> inode_permission+217
> link_path_walk+162
> path_openat+152
> do_filp_open+171
> do_sys_openat2+98
> __x64_sys_openat+69
> do_syscall_64+93
>
>Verified with:
>
>cp /bin/echo /tmp/echo_netadmin
>setcap cap_net_admin+ep /tmp/echo_netadmin
>
>(non-root user fails with regular echo, succeeds with
>/tmp/echo_netadmin)
Thanks for checking!
Stefano
Powered by blists - more mailing lists