lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190523153703.GC19296@stefanha-x1.localdomain>
Date:   Thu, 23 May 2019 16:37:03 +0100
From:   Stefan Hajnoczi <stefanha@...hat.com>
To:     Stefano Garzarella <sgarzare@...hat.com>
Cc:     netdev@...r.kernel.org, Dexuan Cui <decui@...rosoft.com>,
        Jorgen Hansen <jhansen@...are.com>,
        "David S. Miller" <davem@...emloft.net>,
        Vishnu Dasa <vdasa@...are.com>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Sasha Levin <sashal@...nel.org>
Subject: Re: [RFC] vsock: proposal to support multiple transports at runtime

On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> Hi guys,
> I'm currently interested on implement a multi-transport support for VSOCK in
> order to handle nested VMs.
> 
> As Stefan suggested me, I started to look at this discussion:
> https://lkml.org/lkml/2017/8/17/551
> Below I tried to summarize a proposal for a discussion, following the ideas
> from Dexuan, Jorgen, and Stefan.
> 
> 
> We can define two types of transport that we have to handle at the same time
> (e.g. in a nested VM we would have both types of transport running together):
> 
> - 'host side transport', it runs in the host and it is used to communicate with
>   the guests of a specific hypervisor (KVM, VMWare or HyperV)
> 
>   Should we support multiple 'host side transport' running at the same time?
> 
> - 'guest side transport'. it runs in the guest and it is used to communicate
>   with the host transport

I find this terminology confusing.  Perhaps "host->guest" (your 'host
side transport') and "guest->host" (your 'guest side transport') is
clearer?

Or maybe the nested virtualization terminology of L2 transport (your
'host side transport') and L0 transport (your 'guest side transport')?
Here we are the L1 guest and L0 is the host and L2 is our nested guest.

> 
> 
> The main goal is to find a way to decide what transport use in these cases:
> 1. connect() / sendto()
> 
> 	a. use the 'host side transport', if the destination is the guest
> 	   (dest_cid > VMADDR_CID_HOST).
> 	   If we want to support multiple 'host side transport' running at the
> 	   same time, we should assign CIDs uniquely across all transports.
> 	   In this way, a packet generated by the host side will get directed
> 	   to the appropriate transport based on the CID

The multiple host side transport case is unlikely to be necessary on x86
where only one hypervisor uses VMX at any given time.  But eventually it
may happen so it's wise to at least allow it in the design.

> 
> 	b. use the 'guest side transport', if the destination is the host
> 	   (dest_cid == VMADDR_CID_HOST)

Makes sense to me.

> 
> 
> 2. listen() / recvfrom()
> 
> 	a. use the 'host side transport', if the socket is bound to
> 	   VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> 	   guest transport.
> 	   We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> 	   address this case.
> 	   If we want to support multiple 'host side transport' running at the
> 	   same time, we should find a way to allow an application to bound a
> 	   specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> 	   VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)

Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
should only be available to a subset of VMware VMs?

Instead it might be more appropriate to use network namespaces to create
independent AF_VSOCK addressing domains.  Then you could have two
separate groups of VMware VMs and selectively listen to just one group.

> 
> 	b. use the 'guest side transport', if the socket is bound to local CID
> 	   different from the VMADDR_CID_HOST (guest CID get with
> 	   IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
> 	   (to be backward compatible).
> 	   Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.

Two additional topics:

1. How will loading af_vsock.ko change?  In particular, can an
   application create a socket in af_vsock.ko without any loaded
   transport?  Can it enter listen state without any loaded transport
   (this seems useful with VMADDR_CID_ANY)?

2. Does your proposed behavior match VMware's existing nested vsock
   semantics?

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ