lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ureyl5b2tneivmlce4fdtmuoxgayfxwgewoypb6oyxeh7ozt3i@chygpr2uvtcp>
Date: Wed, 12 Nov 2025 15:21:39 +0100
From: Stefano Garzarella <sgarzare@...hat.com>
To: Bobby Eshleman <bobbyeshleman@...il.com>
Cc: Shuah Khan <shuah@...nel.org>, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Stefan Hajnoczi <stefanha@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>, 
	Jason Wang <jasowang@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, 
	Eugenio Pérez <eperezma@...hat.com>, "K. Y. Srinivasan" <kys@...rosoft.com>, 
	Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, 
	Bryan Tan <bryan-bt.tan@...adcom.com>, Vishnu Dasa <vishnu.dasa@...adcom.com>, 
	Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>, virtualization@...ts.linux.dev, netdev@...r.kernel.org, 
	linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org, 
	linux-hyperv@...r.kernel.org, Sargun Dhillon <sargun@...gun.me>, berrange@...hat.com, 
	Bobby Eshleman <bobbyeshleman@...a.com>
Subject: Re: [PATCH net-next v9 08/14] vsock: reject bad VSOCK_NET_MODE_LOCAL
 configuration for G2H

On Tue, Nov 11, 2025 at 10:54:50PM -0800, Bobby Eshleman wrote:
>From: Bobby Eshleman <bobbyeshleman@...a.com>
>
>Reject setting VSOCK_NET_MODE_LOCAL with -EOPNOTSUPP if a G2H transport
>is operational. Additionally, reject G2H transport registration if there
>already exists a namespace in local mode.
>
>G2H sockets break in local mode because the G2H transports don't support
>namespacing yet. The current approach is to coerce packets coming out of
>G2H transports into VSOCK_NET_MODE_GLOBAL mode, but it is not possible
>to coerce sockets in the same way because it cannot be deduced which
>transport will be used by the socket. Specifically, when bound to
>VMADDR_CID_ANY in a nested VM (both G2H and H2G available), it is not
>until a packet is received and matched to the bound socket that we
>assign the transport. This presents a chicken-and-egg problem, because
>we need the namespace to lookup the socket and resolve the transport,
>but we need the transport to know how to use the namespace during
>lookup.
>
>For that reason, this patch prevents VSOCK_NET_MODE_LOCAL from being
>used on systems that support G2H, even nested systems that also have H2G
>transports.
>
>Local mode is blocked based on detecting the presence of G2H devices
>(when possible, as hyperv is special). This means that a host kernel
>with G2H support compiled in (or has the module loaded), will still
>support local mode because there is no G2H (e.g., virtio-vsock) device
>detected. This enables using the same kernel in the host and in the
>guest, as we do in kselftest.
>
>Systems with only namespace-aware transports (vhost-vsock, loopback) can
>still use both VSOCK_NET_MODE_GLOBAL and VSOCK_NET_MODE_LOCAL modes as
>intended.
>
>The hyperv transport must be treated specially. Other G2H transports can
>can report presence of a device using get_local_cid(). When a device is
>present it returns a valid CID; otherwise, it returns VMADDR_CID_ANY.
>THe hyperv transport's get_local_cid() always returns VMADDR_CID_ANY,
>however, even when a device is present.
>
>For that reason, this patch adds an always_block_local_mode flag to
>struct vsock_transport. When set to true, VSOCK_NET_MODE_LOCAL is
>blocked unconditionally whenever the transport is registered, regardless
>of device presence. When false, LOCAL mode is only blocked when
>get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).
>
>The hyperv transport sets this flag to true to unconditionally block
>local mode. Other G2H transports (virtio-vsock, vmci-vsock) leave it
>false and continue using device detection via get_local_cid() to block
>local mode.
>
>These restrictions can be lifted in a future patch series when G2H
>transports gain namespace support.

IMO this commit should be before supporting namespaces in any device,
so we are sure we don't have commits where this can happen.

>
>Signed-off-by: Bobby Eshleman <bobbyeshleman@...a.com>
>---
> include/net/af_vsock.h           |  8 +++++++
> net/vmw_vsock/af_vsock.c         | 45 +++++++++++++++++++++++++++++++++++++---
> net/vmw_vsock/hyperv_transport.c |  1 +
> 3 files changed, 51 insertions(+), 3 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index cfd121bb5ab7..089c61105dda 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -108,6 +108,14 @@ struct vsock_transport_send_notify_data {
>
> struct vsock_transport {
> 	struct module *module;
>+	/* If true, block VSOCK_NET_MODE_LOCAL unconditionally when this G2H
>+	 * transport is registered. If false, only block LOCAL mode when
>+	 * get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).
>+	 * Hyperv sets this true because it doesn't offer a callback that
>+	 * detects device presence. This only applies to G2H transports; H2G
>+	 * transports are unaffected.
>+	 */
>+	bool always_block_local_mode;
>
> 	/* Initialize/tear-down socket. */
> 	int (*init)(struct vsock_sock *, struct vsock_sock *);
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index c0b5946bdc95..a2da1810b802 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -91,6 +91,11 @@
>  *   and locked down by a namespace manager. The default is "global". The mode
>  *   is set per-namespace.
>  *
>+ *   Note: LOCAL mode is only supported when using namespace-aware transports
>+ *   (vhost-vsock, loopback). If a guest-to-host transport (virtio-vsock,
>+ *   hyperv-vsock, vmci-vsock) is loaded, attempts to set LOCAL mode will fail
>+ *   with EOPNOTSUPP, as these transports do not support per-namespace 
>isolation.
>+ *
>  *   The modes affect the allocation and accessibility of CIDs as follows:
>  *
>  *   - global - access and allocation are all system-wide
>@@ -2757,12 +2762,30 @@ static int vsock_net_mode_string(const struct ctl_table *table, int write,
> 		if (*lenp >= sizeof(data))
> 			return -EINVAL;
>
>-		if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))
>+		if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data))) {
> 			mode = VSOCK_NET_MODE_GLOBAL;
>-		else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))
>+		} else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data))) {
>+			/* LOCAL mode is not supported when G2H transports
>+			 * (virtio-vsock, hyperv, vmci) are active, because
>+			 * these transports don't support namespaces. We must
>+			 * stay in GLOBAL mode to avoid bind/lookup mismatches.
>+			 *
>+			 * Check if G2H transport is present and either:
>+			 * 1. Has always_block_local_mode set (hyperv), OR
>+			 * 2. Has an actual device present (get_local_cid() != VMADDR_CID_ANY)
>+			 */
>+			mutex_lock(&vsock_register_mutex);
>+			if (transport_g2h &&
>+			    (transport_g2h->always_block_local_mode ||
>+			     transport_g2h->get_local_cid() != VMADDR_CID_ANY)) {

This seems almost like a hack. What about adding a new function in the 
transports that tells us whether a device is present or not and 
implement it in all of them?

Or a more specific function to check if the transport can work with 
local mode (e.g.  netns_local_aware() or something like that - I'm not 
great with nameming xD)

>+				mutex_unlock(&vsock_register_mutex);
>+				return -EOPNOTSUPP;
>+			}
>+			mutex_unlock(&vsock_register_mutex);

What happen if the G2H is loaded here, just after we release the mutex?

I suspect there could be a race that we may fix postponing the unlock 
after the vsock_net_write_mode() call.

WDYT?

> 			mode = VSOCK_NET_MODE_LOCAL;
>-		else
>+		} else {
> 			return -EINVAL;
>+		}
>
> 		if (!vsock_net_write_mode(net, mode))
> 			return -EPERM;
>@@ -2909,6 +2932,7 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> {
> 	const struct vsock_transport *t_h2g, *t_g2h, *t_dgram, *t_local;
> 	int err = mutex_lock_interruptible(&vsock_register_mutex);
>+	struct net *net;
>
> 	if (err)
> 		return err;
>@@ -2931,6 +2955,21 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> 			err = -EBUSY;
> 			goto err_busy;
> 		}
>+
>+		/* G2H sockets break in LOCAL mode namespaces because G2H transports
>+		 * don't support them yet. Block registering new G2H transports if we
>+		 * already have local mode namespaces on the system.
>+		 */
>+		rcu_read_lock();
>+		for_each_net_rcu(net) {
>+			if (vsock_net_mode(net) == VSOCK_NET_MODE_LOCAL) {
>+				rcu_read_unlock();
>+				err = -EOPNOTSUPP;
>+				goto err_busy;
>+			}
>+		}
>+		rcu_read_unlock();
>+
> 		t_g2h = t;
> 	}
>
>diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>index 432fcbbd14d4..ed48dd1ff19b 100644
>--- a/net/vmw_vsock/hyperv_transport.c
>+++ b/net/vmw_vsock/hyperv_transport.c
>@@ -835,6 +835,7 @@ int hvs_notify_set_rcvlowat(struct vsock_sock *vsk, int val)
>
> static struct vsock_transport hvs_transport = {
> 	.module                   = THIS_MODULE,
>+	.always_block_local_mode  = true,
>
> 	.get_local_cid            = hvs_get_local_cid,
>
>
>-- 
>2.47.3
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ