lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR05MB3376D035CA84FB6189CC1BFFDA1E0@MWHPR05MB3376.namprd05.prod.outlook.com>
Date:   Tue, 28 May 2019 16:01:00 +0000
From:   Jorgen Hansen <jhansen@...are.com>
To:     Stefano Garzarella <sgarzare@...hat.com>,
        Stefan Hajnoczi <stefanha@...hat.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Dexuan Cui <decui@...rosoft.com>,
        "David S. Miller" <davem@...emloft.net>,
        Vishnu DASA <vdasa@...are.com>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Sasha Levin <sashal@...nel.org>
Subject: Re: [RFC] vsock: proposal to support multiple transports at runtime

> On Thu, May 23, 2019 at 04:37:03PM +0100, Stefan Hajnoczi wrote:
> > On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> > > Hi guys,
> > > I'm currently interested on implement a multi-transport support for VSOCK in
> > > order to handle nested VMs.

Thanks for picking this up!

> > >
> > > As Stefan suggested me, I started to look at this discussion:
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2017%2F8%2F17%2F551&amp;data=02%7C01%7Cjhansen%40vmware.com%7Cc2a340a868bb4525c6d408d6e2905909%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636945506938670252&amp;sdata=kl820ZF1AAOXEyCZYoNPpYmLVyvK3ISr1GT0oDODEn4%3D&amp;reserved=0
> > > Below I tried to summarize a proposal for a discussion, following the ideas
> > > from Dexuan, Jorgen, and Stefan.
> > >
> > >
> > > We can define two types of transport that we have to handle at the same time
> > > (e.g. in a nested VM we would have both types of transport running together):
> > >
> > > - 'host side transport', it runs in the host and it is used to communicate with
> > >   the guests of a specific hypervisor (KVM, VMWare or HyperV)
> > >
> > >   Should we support multiple 'host side transport' running at the same time?
> > >
> > > - 'guest side transport'. it runs in the guest and it is used to communicate
> > >   with the host transport
> >
> > I find this terminology confusing.  Perhaps "host->guest" (your 'host
> > side transport') and "guest->host" (your 'guest side transport') is
> > clearer?
>
> I agree, "host->guest" and "guest->host" are better, I'll use them.
>
> >
> > Or maybe the nested virtualization terminology of L2 transport (your
> > 'host side transport') and L0 transport (your 'guest side transport')?
> > Here we are the L1 guest and L0 is the host and L2 is our nested guest.
> >
>
> I'm confused, if L2 is the nested guest, it should be the
> 'guest side transport'. Did I miss anything?
>
> Maybe it is another point to your first proposal :)
>
> > >
> > >
> > > The main goal is to find a way to decide what transport use in these cases:
> > > 1. connect() / sendto()
> > >
> > >     a. use the 'host side transport', if the destination is the guest
> > >        (dest_cid > VMADDR_CID_HOST).
> > >        If we want to support multiple 'host side transport' running at the
> > >        same time, we should assign CIDs uniquely across all transports.
> > >        In this way, a packet generated by the host side will get directed
> > >        to the appropriate transport based on the CID
> >
> > The multiple host side transport case is unlikely to be necessary on x86
> > where only one hypervisor uses VMX at any given time.  But eventually it
> > may happen so it's wise to at least allow it in the design.
> >
>
> Okay, I was in doubt, but I'll keep it in the design.
>
> > >
> > >     b. use the 'guest side transport', if the destination is the host
> > >        (dest_cid == VMADDR_CID_HOST)
> >
> > Makes sense to me.
> >

Agreed. With the addition that VMADDR_CID_HYPERVISOR is also routed as "guest->host/guest side transport".

>> >
>> >
>> > 2. listen() / recvfrom()
> > >
>> >     a. use the 'host side transport', if the socket is bound to
> > >        VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> > >        guest transport.
> > >        We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> > >        address this case.
> > >        If we want to support multiple 'host side transport' running at the
> > >        same time, we should find a way to allow an application to bound a
> > >        specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> > >        VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)
> >
> > Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
> > VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
> > should only be available to a subset of VMware VMs?
>
> You're right, it is not very flexible.

When I was last looking at this, I was considering a proposal where the incoming traffic would determine which transport to use for CID_ANY in the case of multiple transports. For stream sockets, we already have a shared port space, so if we receive a connection request for < port N, CID_ANY>, that connection would use the transport of the incoming request. The transport could either be a host->guest transport or the guest->host transport. This is a bit harder to do for datagrams since the VSOCK port is decided by the transport itself today. For VMCI, a VMCI datagram handler is allocated for each datagram socket, and the ID of that handler is used as the port. So we would potentially have to register the same datagram port with all transports.

The use of network namespaces would be complimentary to this, and could be used to partition VMs between hypervisors or at a finer granularity. This could also be used to isolate host applications from guest applications using the same ports with CID_ANY if necessary.

>
> >
> > Instead it might be more appropriate to use network namespaces to create
> > independent AF_VSOCK addressing domains.  Then you could have two
> > separate groups of VMware VMs and selectively listen to just one group.
> >
>
> Does AF_VSOCK support network namespace or it could be another
> improvement to take care? (IIUC is not currently supported)
>
> A possible issue that I'm seeing with netns is if they are used for
> other purpose (e.g. to isolate the network of a VM), we should have
> multiple instances of the application, one per netns.
>
> > >
> > >     b. use the 'guest side transport', if the socket is bound to local CID
> > >        different from the VMADDR_CID_HOST (guest CID get with
> > >        IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
> > >        (to be backward compatible).
> > >        Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.
> >
> > Two additional topics:
> >
> > 1. How will loading af_vsock.ko change?
>
> I'd allow the loading of af_vsock.ko without any transport.
> Maybe we should move the MODULE_ALIAS_NETPROTO(PF_VSOCK) from the
> vmci_transport.ko to the af_vsock.ko, but this can impact the VMware
> driver.

As I remember it, this will impact the existing VMware products. I'll have to double check that.

>
> >    In particular, can an
> >    application create a socket in af_vsock.ko without any loaded
> >    transport?  Can it enter listen state without any loaded transport
> >    (this seems useful with VMADDR_CID_ANY)?
>
> I'll check if we can allow listen sockets without any loaded transport,
> but I think could be a nice behaviour to have.
>
> >
> > 2. Does your proposed behavior match VMware's existing nested vsock
> >    semantics?
>
> I'm not sure, but I tried to follow the Jorgen's answers to the original
> thread. I hope that this proposal matches the VMware semantic.

Yes, the semantics should be preserved.

Thanks,
Jorgen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ