lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSSV4RlRcW+uGy+n@devvm11784.nha0.facebook.com>
Date: Mon, 24 Nov 2025 09:29:05 -0800
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>,
	Stefan Hajnoczi <stefanha@...hat.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Jason Wang <jasowang@...hat.com>,
	Eugenio Pérez <eperezma@...hat.com>,
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
	"K. Y. Srinivasan" <kys@...rosoft.com>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
	Bryan Tan <bryan-bt.tan@...adcom.com>,
	Vishnu Dasa <vishnu.dasa@...adcom.com>,
	Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>,
	Shuah Khan <shuah@...nel.org>, linux-kernel@...r.kernel.org,
	virtualization@...ts.linux.dev, netdev@...r.kernel.org,
	kvm@...r.kernel.org, linux-hyperv@...r.kernel.org,
	linux-kselftest@...r.kernel.org, berrange@...hat.com,
	Sargun Dhillon <sargun@...gun.me>,
	Bobby Eshleman <bobbyeshleman@...a.com>
Subject: Re: [PATCH net-next v11 03/13] vsock: reject bad
 VSOCK_NET_MODE_LOCAL configuration for G2H

On Mon, Nov 24, 2025 at 11:10:19AM +0100, Stefano Garzarella wrote:
> On Fri, Nov 21, 2025 at 11:01:53AM -0800, Bobby Eshleman wrote:
> > On Fri, Nov 21, 2025 at 03:24:25PM +0100, Stefano Garzarella wrote:
> > > On Thu, Nov 20, 2025 at 09:44:35PM -0800, Bobby Eshleman wrote:
> > > > From: Bobby Eshleman <bobbyeshleman@...a.com>
> > > >
> > > > Reject setting VSOCK_NET_MODE_LOCAL with -EOPNOTSUPP if a G2H transport
> > > > is operational. Additionally, reject G2H transport registration if there
> > > > already exists a namespace in local mode.
> > > >
> > > > G2H sockets break in local mode because the G2H transports don't support
> > > > namespacing yet. The current approach is to coerce packets coming out of
> > > > G2H transports into VSOCK_NET_MODE_GLOBAL mode, but it is not possible
> > > > to coerce sockets in the same way because it cannot be deduced which
> > > > transport will be used by the socket. Specifically, when bound to
> > > > VMADDR_CID_ANY in a nested VM (both G2H and H2G available), it is not
> > > > until a packet is received and matched to the bound socket that we
> > > > assign the transport. This presents a chicken-and-egg problem, because
> > > > we need the namespace to lookup the socket and resolve the transport,
> > > > but we need the transport to know how to use the namespace during
> > > > lookup.
> > > >
> > > > For that reason, this patch prevents VSOCK_NET_MODE_LOCAL from being
> > > > used on systems that support G2H, even nested systems that also have H2G
> > > > transports.
> > > >
> > > > Local mode is blocked based on detecting the presence of G2H devices
> > > > (when possible, as hyperv is special). This means that a host kernel
> > > > with G2H support compiled in (or has the module loaded), will still
> > > > support local mode if there is no G2H (e.g., virtio-vsock) device
> > > > detected. This enables using the same kernel in the host and in the
> > > > guest, as we do in kselftest.
> > > >
> > > > Systems with only namespace-aware transports (vhost-vsock, loopback) can
> > > > still use both VSOCK_NET_MODE_GLOBAL and VSOCK_NET_MODE_LOCAL modes as
> > > > intended.
> > > >
> > > > Add supports_local_mode() transport callback to indicate
> > > > transport-specific local mode support.
> > > >
> > > > These restrictions can be lifted in a future patch series when G2H
> > > > transports gain namespace support.
> > > >
> > > > Signed-off-by: Bobby Eshleman <bobbyeshleman@...a.com>
> > > > ---
> > > > Changes in v11:
> > > > - vhost_transport_supports_local_mode() returns false to keep things
> > > >  disabled until support comes online (Stefano)
> > > > - add comment above supports_local_mode() cb to clarify (Stefano)
> > > > - Remove redundant `ret = 0` initialization in vsock_net_mode_string()
> > > >  (Stefano)
> > > > - Refactor vsock_net_mode_string() to separate parsing from validation
> > > >  (Stefano)
> > > > - vmci returns false for supports_local_mode(), with comment
> > > >
> > > > Changes in v10:
> > > > - move this patch before any transports bring online namespacing (Stefano)
> > > > - move vsock_net_mode_string into critical section (Stefano)
> > > > - add ->supports_local_mode() callback to transports (Stefano)
> > > > ---
> > > > drivers/vhost/vsock.c            |  6 ++++++
> > > > include/net/af_vsock.h           | 11 +++++++++++
> > > > net/vmw_vsock/af_vsock.c         | 32 ++++++++++++++++++++++++++++++++
> > > > net/vmw_vsock/hyperv_transport.c |  6 ++++++
> > > > net/vmw_vsock/virtio_transport.c | 13 +++++++++++++
> > > > net/vmw_vsock/vmci_transport.c   | 12 ++++++++++++
> > > > net/vmw_vsock/vsock_loopback.c   |  6 ++++++
> > > > 7 files changed, 86 insertions(+)
> > > >
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index 69074656263d..4e3856aa2479 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -64,6 +64,11 @@ static u32 vhost_transport_get_local_cid(void)
> > > > 	return VHOST_VSOCK_DEFAULT_HOST_CID;
> > > > }
> > > >
> > > > +static bool vhost_transport_supports_local_mode(void)
> > > > +{
> > > > +	return false;
> > > > +}
> > > > +
> > > > /* Callers that dereference the return value must hold vhost_vsock_mutex or the
> > > >  * RCU read lock.
> > > >  */
> > > > @@ -412,6 +417,7 @@ static struct virtio_transport vhost_transport = {
> > > > 		.module                   = THIS_MODULE,
> > > >
> > > > 		.get_local_cid            = vhost_transport_get_local_cid,
> > > > +		.supports_local_mode	  = vhost_transport_supports_local_mode,
> > > >
> > > > 		.init                     = virtio_transport_do_socket_init,
> > > > 		.destruct                 = virtio_transport_destruct,
> > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > index 59d97a143204..e24ef1d9fe02 100644
> > > > --- a/include/net/af_vsock.h
> > > > +++ b/include/net/af_vsock.h
> > > > @@ -180,6 +180,17 @@ struct vsock_transport {
> > > > 	/* Addressing. */
> > > > 	u32 (*get_local_cid)(void);
> > > >
> > > > +	/* Return true if the transport is compatible with
> > > > +	 * VSOCK_NET_MODE_LOCAL. Otherwise, return false.
> > > > +	 *
> > > > +	 * Transports should return false if they lack local-mode namespace
> > > > +	 * support (e.g., G2H transports like hyperv-vsock and vmci-vsock).
> > > > +	 * virtio-vsock returns true only if no device is present in order to
> > > > +	 * enable local mode in nested scenarios in which virtio-vsock is
> > > > +	 * loaded or built-in, but nonetheless unusable by sockets.
> > > > +	 */
> > > > +	bool (*supports_local_mode)(void);
> > > > +
> > > > 	/* Read a single skb */
> > > > 	int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> > > >
> > > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > > index 243c0d588682..120adb9dad9f 100644
> > > > --- a/net/vmw_vsock/af_vsock.c
> > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > @@ -91,6 +91,12 @@
> > > >  *   and locked down by a namespace manager. The default is "global". The mode
> > > >  *   is set per-namespace.
> > > >  *
> > > > + *   Note: LOCAL mode is only supported when using namespace-aware transports
> > > > + *   (vhost-vsock, loopback). If a guest-to-host transport (virtio-vsock,
> > > > + *   hyperv-vsock, vmci-vsock) is operational, attempts to set LOCAL mode will
> > > > + *   fail with EOPNOTSUPP, as these transports do not support per-namespace
> > > > + *   isolation.
> > > > + *
> > > >  *   The modes affect the allocation and accessibility of CIDs as follows:
> > > >  *
> > > >  *   - global - access and allocation are all system-wide
> > > > @@ -2794,6 +2800,15 @@ static int vsock_net_mode_string(const struct ctl_table *table, int write,
> > > > 	else
> > > > 		return -EINVAL;
> > > >
> > > > +	mutex_lock(&vsock_register_mutex);
> > > > +	if (mode == VSOCK_NET_MODE_LOCAL &&
> > > > +	    transport_g2h && transport_g2h->supports_local_mode &&
> > > > +	    !transport_g2h->supports_local_mode()) {
> > > > +		mutex_unlock(&vsock_register_mutex);
> > > > +		return -EOPNOTSUPP;
> > > > +	}
> > > > +	mutex_unlock(&vsock_register_mutex);
> > > 
> > > Wait, I think we already discussed about this, vsock_net_write_mode() must
> > > be called with the lock held.
> > > 
> > > See
> > > https://lore.kernel.org/netdev/aRTTwuuXSz5CvNjt@devvm11784.nha0.facebook.com/
> > > 
> > 
> > Ah right, oversight on my part.
> > 
> > > Since I guess we need another version of this patch, can you check the
> > > commit description to see if it reflects what we are doing now
> > > (e.g vhost is not enabled)?
> > > 
> > > Also I don't understand why for vhost we will enable it later, but for
> > > virtio_transport and vsock_loopback we are enabling it now, also if this
> > > patch is before the support on that transports. I'm a bit confused.
> > > 
> > > If something is unclear, let's discuss it before sending a new version.
> > > 
> > > 
> > > What I had in mind was, add this patch and explain why we need this new
> > > callback (like you did), but enable the support in the patches that
> > > really enable it for any transport. But maybe what is not clear to me is
> > > that we need this only for G2H. But now I'm confused about the discussion
> > > around vmci H2G. We decided to discard also that one, but here we are not
> > > checking that?
> > > I mean here we are calling supports_local_mode() only on G2H IIUC.
> > 
> > Ah right, VMCI broke my original mental model of only needing this check
> > for G2H (originally I didn't realize VMCI was H2G too).
> > 
> > I think now, we actually need to do this check for all of the transports
> > no? Including h2g, g2h, local, and dgram?
> > 
> > Additionally, the commit description needs to be updated to reflect that.
> 
> Let's take a step back, though, because I tried to understand the problem
> better and I'm confused.
> 
> For example, in vmci (G2H side), when a packet arrives, we always use
> vsock_find_connected_socket(), which only searches in GLOBAL. So connections
> originating from the host can only reach global sockets in the guest. In
> this direction (host -> guest), we should be fine, right?
> 
> Now let's consider the other direction, from guest to host, so the
> connection should be generated via vsock_connect().
> Here I see that we are not doing anything with regard to the source
> namespace. At this point, my question is whether we should modify
> vsock_assign_transport() or transport->stream_allow() to do this for each
> stream, and not prevent loading a G2H module a priori.
> 
> For example, stream_allow() could check that the socket namespace is
> supported by the assigned transport. E.g., vmci can check that if the
> namespace mode is not GLOBAL, then it returns false. (Same thing in
> virtio-vsock, I mean the G2H driver).
> 
> This should solve the guest -> host direction, but at this point I wonder if
> I'm missing something.

For the G2H connect case that is true, but the situation gets a little
fuzzier on the G2H RX side w/ VMADDR_CID_ANY listeners.

Let's say we have a nested system w/ both virtio-vsock and vhost-vsock.
We have a listener in namespace local on VMADDR_CID_ANY. So far, no
transport is assigned, so we can't call t->stream_allow() yet.
virtio-vsock only knows of global mode, so its lookup will fail (unless
we hack in some special case to virtio_transport_recv_pkt() to scan
local namespaces). vhost-vsock will work as expected. Letting local mode
sockets be silently unreachable by virtio-vsock seems potentially
confusing for users. If the system were not nested, we can pre-resolve
VMADDR_CID_ANY in bind() and handle things upfront as well. Rejecting
local mode outright is just a broad guardrail.

If we're trying to find a less heavy-handed option, we might be able to
do the following:

- change bind(cid) w/ cid != VMADDR_CID_ANY to directly assign the transport
  for all socket types (not just SOCK_DGRAM)

- vsock_assign_transport() can outright fail if !t->supports_local_mode()
  and sock_net(sk) has mode local

- bind(VMADDR_CID_ANY) can maybe print (once) to dmesg a warning that
  only the H2G transport will land on VMADDR_CID_ANY sockets.

I'm certainly open to other suggestions.

Best,
Bobby

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ