lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQ1TXjb8AWIzgAu4@devvm11784.nha0.facebook.com>
Date: Thu, 6 Nov 2025 18:03:10 -0800
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: Shuah Khan <shuah@...nel.org>, "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>,
	Stefan Hajnoczi <stefanha@...hat.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Jason Wang <jasowang@...hat.com>,
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
	Eugenio Pérez <eperezma@...hat.com>,
	"K. Y. Srinivasan" <kys@...rosoft.com>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
	Bryan Tan <bryan-bt.tan@...adcom.com>,
	Vishnu Dasa <vishnu.dasa@...adcom.com>,
	Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>,
	virtualization@...ts.linux.dev, netdev@...r.kernel.org,
	linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org,
	kvm@...r.kernel.org, linux-hyperv@...r.kernel.org,
	berrange@...hat.com, Bobby Eshleman <bobbyeshleman@...a.com>
Subject: Re: [PATCH net-next v8 04/14] vsock: add netns to vsock core

On Thu, Nov 06, 2025 at 05:18:00PM +0100, Stefano Garzarella wrote:
> On Thu, Oct 23, 2025 at 11:27:43AM -0700, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@...a.com>
> > 
> > Add netns logic to vsock core. Additionally, modify transport hook
> > prototypes to be used by later transport-specific patches (e.g.,
> > *_seqpacket_allow()).
> > 
> > Namespaces are supported primarily by changing socket lookup functions
> > (e.g., vsock_find_connected_socket()) to take into account the socket
> > namespace and the namespace mode before considering a candidate socket a
> > "match".
> > 
> > Introduce a dummy namespace struct, __vsock_global_dummy_net, to be
> > used by transports that do not support namespacing. This dummy always
> > has mode "global" to preserve previous CID behavior.
> > 
> > This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that
> > accepts the "global" or "local" mode strings.
> > 
> > The transports (besides vhost) are modified to use the global dummy,
> > which makes them behave as if always in the global namespace. Vhost is
> > an exception because it inherits its namespace from the process that
> > opens the vhost device.
> > 
> > Add netns functionality (initialization, passing to transports, procfs,
> > etc...) to the af_vsock socket layer. Later patches that add netns
> > support to transports depend on this patch.
> > 
> > seqpacket_allow() callbacks are modified to take a vsk so that transport
> > implementations can inspect sock_net(sk) and vsk->net_mode when performing
> > lookups (e.g., vhost does this in its future netns patch). Because the
> > API change affects all transports, it seemed more appropriate to make
> > this internal API change in the "vsock core" patch then in the "vhost"
> > patch.
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@...a.com>
> > ---
> > Changes in v7:
> > - hv_sock: fix hyperv build error
> > - explain why vhost does not use the dummy
> > - explain usage of __vsock_global_dummy_net
> > - explain why VSOCK_NET_MODE_STR_MAX is 8 characters
> > - use switch-case in vsock_net_mode_string()
> > - avoid changing transports as much as possible
> > - add vsock_find_{bound,connected}_socket_net()
> > - rename `vsock_hdr` to `sysctl_hdr`
> > - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
> >  global mode for virtio-vsock, move skb->cb zero-ing into wrapper
> > - explain seqpacket_allow() change
> > - move net setting to __vsock_create() instead of vsock_create() so
> >  that child sockets also have their net assigned upon accept()
> > 
> > Changes in v6:
> > - unregister sysctl ops in vsock_exit()
> > - af_vsock: clarify description of CID behavior
> > - af_vsock: fix buf vs buffer naming, and length checking
> > - af_vsock: fix length checking w/ correct ctl_table->maxlen
> > 
> > Changes in v5:
> > - vsock_global_net() -> vsock_global_dummy_net()
> > - update comments for new uAPI
> > - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
> > - add prototype changes so patch remains compilable
> > ---
> > drivers/vhost/vsock.c            |   4 +-
> > include/linux/virtio_vsock.h     |  21 ++++
> > include/net/af_vsock.h           |  14 ++-
> > net/vmw_vsock/af_vsock.c         | 264 ++++++++++++++++++++++++++++++++++++---
> > net/vmw_vsock/virtio_transport.c |   7 +-
> > net/vmw_vsock/vsock_loopback.c   |   4 +-
> > 6 files changed, 288 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index ae01457ea2cd..34adf0cf9124 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void)
> > 	return true;
> > }
> > 
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
> > 
> > static struct virtio_transport vhost_transport = {
> > 	.transport = {
> > @@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = {
> > 	.send_pkt = vhost_transport_send_pkt,
> > };
> > 
> > -static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid)
> > {
> > 	struct vhost_vsock *vsock;
> > 	bool seqpacket_allow = false;
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 7f334a32133c..29290395054c 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -153,6 +153,27 @@ static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb,
> > 	VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode;
> > }
> > 
> > +static inline struct sk_buff *
> > +virtio_vsock_alloc_rx_skb(unsigned int size, gfp_t mask)
> > +{
> > +	struct sk_buff *skb;
> > +
> > +	skb = virtio_vsock_alloc_linear_skb(size, mask);
> > +	if (!skb)
> > +		return NULL;
> > +
> > +	memset(skb->head, 0, VIRTIO_VSOCK_SKB_HEADROOM);
> > +
> > +	/* virtio-vsock does not yet support namespaces, so on receive
> > +	 * we force legacy namespace behavior using the global dummy net
> > +	 * and global net mode.
> > +	 */
> > +	virtio_vsock_skb_set_net(skb, vsock_global_dummy_net());
> > +	virtio_vsock_skb_set_net_mode(skb, VSOCK_NET_MODE_GLOBAL);
> > +
> > +	return skb;
> > +}
> 
> Why we are introducing this change in this patch?
> 
> Where the net of the virtio's skb is read?
> 

Oh good point, this is a weird place for this. I'll move this to where
it is actually used.

[...]

> > 
> > +static int vsock_net_mode_string(const struct ctl_table *table, int write,
> > +				 void *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +	char data[VSOCK_NET_MODE_STR_MAX] = {0};
> > +	enum vsock_net_mode mode;
> > +	struct ctl_table tmp;
> > +	struct net *net;
> > +	int ret;
> > +
> > +	if (!table->data || !table->maxlen || !*lenp) {
> > +		*lenp = 0;
> > +		return 0;
> > +	}
> > +
> > +	net = current->nsproxy->net_ns;
> > +	tmp = *table;
> > +	tmp.data = data;
> > +
> > +	if (!write) {
> > +		const char *p;
> > +
> > +		mode = vsock_net_mode(net);
> > +
> > +		switch (mode) {
> > +		case VSOCK_NET_MODE_GLOBAL:
> > +			p = VSOCK_NET_MODE_STR_GLOBAL;
> > +			break;
> > +		case VSOCK_NET_MODE_LOCAL:
> > +			p = VSOCK_NET_MODE_STR_LOCAL;
> > +			break;
> > +		default:
> > +			WARN_ONCE(true, "netns has invalid vsock mode");
> > +			*lenp = 0;
> > +			return 0;
> > +		}
> > +
> > +		strscpy(data, p, sizeof(data));
> > +		tmp.maxlen = strlen(p);
> > +	}
> > +
> > +	ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (write) {
> 
> Do we need to check some capability, e.g. CAP_NET_ADMIN ?
> 

We get that for free via the sysctl_net registration, through this path
on open (CAP_NET_ADMIN is checked in net_ctl_permissions):

	net_ctl_permissions+1
	sysctl_perm+24
	proc_sys_permission+117
	inode_permission+217
	link_path_walk+162
	path_openat+152
	do_filp_open+171
	do_sys_openat2+98
	__x64_sys_openat+69
	do_syscall_64+93

Verified with:

cp /bin/echo /tmp/echo_netadmin
setcap cap_net_admin+ep /tmp/echo_netadmin

(non-root user fails with regular echo, succeeds with
/tmp/echo_netadmin)

Best regards,
Bobby

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ