[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190523153703.GC19296@stefanha-x1.localdomain>
Date: Thu, 23 May 2019 16:37:03 +0100
From: Stefan Hajnoczi <stefanha@...hat.com>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: netdev@...r.kernel.org, Dexuan Cui <decui@...rosoft.com>,
Jorgen Hansen <jhansen@...are.com>,
"David S. Miller" <davem@...emloft.net>,
Vishnu Dasa <vdasa@...are.com>,
"K. Y. Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Sasha Levin <sashal@...nel.org>
Subject: Re: [RFC] vsock: proposal to support multiple transports at runtime
On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> Hi guys,
> I'm currently interested on implement a multi-transport support for VSOCK in
> order to handle nested VMs.
>
> As Stefan suggested me, I started to look at this discussion:
> https://lkml.org/lkml/2017/8/17/551
> Below I tried to summarize a proposal for a discussion, following the ideas
> from Dexuan, Jorgen, and Stefan.
>
>
> We can define two types of transport that we have to handle at the same time
> (e.g. in a nested VM we would have both types of transport running together):
>
> - 'host side transport', it runs in the host and it is used to communicate with
> the guests of a specific hypervisor (KVM, VMWare or HyperV)
>
> Should we support multiple 'host side transport' running at the same time?
>
> - 'guest side transport'. it runs in the guest and it is used to communicate
> with the host transport
I find this terminology confusing. Perhaps "host->guest" (your 'host
side transport') and "guest->host" (your 'guest side transport') is
clearer?
Or maybe the nested virtualization terminology of L2 transport (your
'host side transport') and L0 transport (your 'guest side transport')?
Here we are the L1 guest and L0 is the host and L2 is our nested guest.
>
>
> The main goal is to find a way to decide what transport use in these cases:
> 1. connect() / sendto()
>
> a. use the 'host side transport', if the destination is the guest
> (dest_cid > VMADDR_CID_HOST).
> If we want to support multiple 'host side transport' running at the
> same time, we should assign CIDs uniquely across all transports.
> In this way, a packet generated by the host side will get directed
> to the appropriate transport based on the CID
The multiple host side transport case is unlikely to be necessary on x86
where only one hypervisor uses VMX at any given time. But eventually it
may happen so it's wise to at least allow it in the design.
>
> b. use the 'guest side transport', if the destination is the host
> (dest_cid == VMADDR_CID_HOST)
Makes sense to me.
>
>
> 2. listen() / recvfrom()
>
> a. use the 'host side transport', if the socket is bound to
> VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> guest transport.
> We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> address this case.
> If we want to support multiple 'host side transport' running at the
> same time, we should find a way to allow an application to bound a
> specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)
Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible. What if my service
should only be available to a subset of VMware VMs?
Instead it might be more appropriate to use network namespaces to create
independent AF_VSOCK addressing domains. Then you could have two
separate groups of VMware VMs and selectively listen to just one group.
>
> b. use the 'guest side transport', if the socket is bound to local CID
> different from the VMADDR_CID_HOST (guest CID get with
> IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
> (to be backward compatible).
> Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.
Two additional topics:
1. How will loading af_vsock.ko change? In particular, can an
application create a socket in af_vsock.ko without any loaded
transport? Can it enter listen state without any loaded transport
(this seems useful with VMADDR_CID_ANY)?
2. Does your proposed behavior match VMware's existing nested vsock
semantics?
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists