lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z+26A3sslT+w+wOI@devvm6277.cco0.facebook.com>
Date: Wed, 2 Apr 2025 15:28:19 -0700
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: Daniel P. Berrangé <berrange@...hat.com>
Cc: Stefano Garzarella <sgarzare@...hat.com>,
	Jakub Kicinski <kuba@...nel.org>,
	"K. Y. Srinivasan" <kys@...rosoft.com>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
	Stefan Hajnoczi <stefanha@...hat.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Jason Wang <jasowang@...hat.com>,
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
	Eugenio Pérez <eperezma@...hat.com>,
	Bryan Tan <bryan-bt.tan@...adcom.com>,
	Vishnu Dasa <vishnu.dasa@...adcom.com>,
	Broadcom internal kernel review list <bcm-kernel-feedback-list@...adcom.com>,
	"David S. Miller" <davem@...emloft.net>,
	virtualization@...ts.linux.dev, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-hyperv@...r.kernel.org,
	kvm@...r.kernel.org
Subject: Re: [PATCH v2 0/3] vsock: add namespace support to vhost-vsock

On Wed, Apr 02, 2025 at 03:18:13PM -0700, Bobby Eshleman wrote:
> On Wed, Apr 02, 2025 at 10:21:36AM +0100, Daniel P. Berrangé wrote:
> > On Wed, Apr 02, 2025 at 10:13:43AM +0200, Stefano Garzarella wrote:
> > > On Wed, 2 Apr 2025 at 02:21, Bobby Eshleman <bobbyeshleman@...il.com> wrote:
> > > >
> > > > I do like Stefano's suggestion to add a sysctl for a "strict" mode,
> > > > Since it offers the best of both worlds, and still tends conservative in
> > > > protecting existing applications... but I agree, the non-strict mode
> > > > vsock would be unique WRT the usual concept of namespaces.
> > > 
> > > Maybe we could do the opposite, enable strict mode by default (I think 
> > > it was similar to what I had tried to do with the kernel module in v1, I 
> > > was young I know xD)
> > > And provide a way to disable it for those use cases where the user wants 
> > > backward compatibility, while paying the cost of less isolation.
> > 
> > I think backwards compatible has to be the default behaviour, otherwise
> > the change has too high risk of breaking existing deployments that are
> > already using netns and relying on VSOCK being global. Breakage has to
> > be opt in.
> > 
> > > I was thinking two options (not sure if the second one can be done):
> > > 
> > >   1. provide a global sysfs/sysctl that disables strict mode, but this
> > >   then applies to all namespaces
> > > 
> > >   2. provide something that allows disabling strict mode by namespace.
> > >   Maybe when it is created there are options, or something that can be
> > >   set later.
> > > 
> > > 2 would be ideal, but that might be too much, so 1 might be enough. In 
> > > any case, 2 could also be a next step.
> > > 
> > > WDYT?
> > 
> > It occured to me that the problem we face with the CID space usage is
> > somewhat similar to the UID/GID space usage for user namespaces.
> > 
> > In the latter case, userns has exposed /proc/$PID/uid_map & gid_map, to
> > allow IDs in the namespace to be arbitrarily mapped onto IDs in the host.
> > 
> > At the risk of being overkill, is it worth trying a similar kind of
> > approach for the vsock CID space ?
> > 
> > A simple variant would be a /proc/net/vsock_cid_outside specifying a set
> > of CIDs which are exclusively referencing /dev/vhost-vsock associations
> > created outside the namespace. Anything not listed would be exclusively
> > referencing associations created inside the namespace.
> > 
> > A more complex variant would be to allow a full remapping of CIDs as is
> > done with userns, via a /proc/net/vsock_cid_map, which the same three
> > parameters, so that CID=15 association outside the namespace could be
> > remapped to CID=9015 inside the namespace, allow the inside namespace
> > to define its out association for CID=15 without clashing.
> > 
> > IOW, mapped CIDs would be exclusively referencing /dev/vhost-vsock
> > associations created outside namespace, while unmapped CIDs would be
> > exclusively referencing /dev/vhost-vsock associations inside the
> > namespace. 
> > 
> > A likely benefit of relying on a kernel defined mapping/partition of
> > the CID space is that apps like QEMU don't need changing, as there's
> > no need to invent a new /dev/vhost-vsock-netns device node.
> > 
> > Both approaches give the desirable security protection whereby the
> > inside namespace can be prevented from accessing certain CIDs that
> > were associated outside the namespace.
> > 
> > Some rule would need to be defined for updating the /proc/net/vsock_cid_map
> > file as it is the security control mechanism. If it is write-once then
> > if the container mgmt app initializes it, nothing later could change
> > it.
> > 
> > A key question is do we need the "first come, first served" behaviour
> > for CIDs where a CID can be arbitrarily used by outside or inside namespace
> > according to whatever tries to associate a CID first ?
> 
> I think with /proc/net/vsock_cid_outside, instead of disallowing the CID
> from being used, this could be solved by disallowing remapping the CID
> while in use?
> 
> The thing I like about this is that users can check
> /proc/net/vsock_cid_outside to figure out what might be going on,
> instead of trying to check lsof or ps to figure out if the VMM processes
> have used /dev/vhost-vsock vs /dev/vhost-vsock-netns.
> 
> Just to check I am following... I suppose we would have a few typical
> configurations for /proc/net/vsock_cid_outside. Following uid_map file
> format of:
> 	"<local cid start>		<global cid start>		<range size>"
> 
> 	1. Identity mapping, current namespace CID is global CID (default
> 	setting for new namespaces):
> 
> 		# empty file
> 
> 				OR
> 
> 		0    0    4294967295
> 
> 	2. Complete isolation from global space (initialized, but no mappings):
> 
> 		0    0    0
> 
> 	3. Mapping in ranges of global CIDs
> 
> 	For example, global CID space starts at 7000, up to 32-bit max:
> 
> 		7000    0    4294960295
> 	
> 	Or for multiple mappings (0-100 map to 7000-7100, 1000-1100 map to
> 	8000-8100) :
> 
> 		7000    0       100
> 		8000    1000    100
> 
> 
> One thing I don't love is that option 3 seems to not be addressing a
> known use case. It doesn't necessarily hurt to have, but it will add
> complexity to CID handling that might never get used?
> 
> Since options 1/2 could also be represented by a boolean (yes/no
> "current ns shares CID with global"), I wonder if we could either A)
> only support the first two options at first, or B) add just
> /proc/net/vsock_ns_mode at first, which supports only "global" and
> "local", and later add a "mapped" mode plus /proc/net/vsock_cid_outside
> or the full mapping if the need arises?
> 
> This could also be how we support Option 2 from Stefano's last email of
> supporting per-namespace opt-in/opt-out.
> 
> Any thoughts on this?
> 

Stefano,

Would only supporting 1/2 still support the Kata use case?

Thanks,
Bobby

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ