[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220111080359-mutt-send-email-mst@kernel.org>
Date: Tue, 11 Jan 2022 08:04:10 -0500
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Yongji Xie <xieyongji@...edance.com>
Cc: Jason Wang <jasowang@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Stefano Garzarella <sgarzare@...hat.com>,
Parav Pandit <parav@...dia.com>,
Christoph Hellwig <hch@...radead.org>,
Christian Brauner <christian.brauner@...onical.com>,
Randy Dunlap <rdunlap@...radead.org>,
Matthew Wilcox <willy@...radead.org>,
Al Viro <viro@...iv.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>, bcrl@...ck.org,
Jonathan Corbet <corbet@....net>,
Mika Penttilä <mika.penttila@...tfour.com>,
Dan Carpenter <dan.carpenter@...cle.com>, joro@...tes.org,
Greg KH <gregkh@...uxfoundation.org>,
He Zhe <zhe.he@...driver.com>,
Liu Xiaodong <xiaodong.liu@...el.com>,
Joe Perches <joe@...ches.com>,
Robin Murphy <robin.murphy@....com>,
Will Deacon <will@...nel.org>,
John Garry <john.garry@...wei.com>, songmuchun@...edance.com,
virtualization <virtualization@...ts.linux-foundation.org>,
Netdev <netdev@...r.kernel.org>, kvm <kvm@...r.kernel.org>,
linux-fsdevel@...r.kernel.org, iommu@...ts.linux-foundation.org,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v12 00/13] Introduce VDUSE - vDPA Device in Userspace
On Tue, Jan 11, 2022 at 08:57:49PM +0800, Yongji Xie wrote:
> On Tue, Jan 11, 2022 at 7:54 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> >
> > On Tue, Jan 11, 2022 at 11:31:37AM +0800, Yongji Xie wrote:
> > > On Mon, Jan 10, 2022 at 11:44 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > > >
> > > > On Mon, Jan 10, 2022 at 11:24:40PM +0800, Yongji Xie wrote:
> > > > > On Mon, Jan 10, 2022 at 11:10 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > > > > >
> > > > > > On Mon, Jan 10, 2022 at 09:54:08PM +0800, Yongji Xie wrote:
> > > > > > > On Mon, Jan 10, 2022 at 8:57 PM Michael S. Tsirkin <mst@...hat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Aug 30, 2021 at 10:17:24PM +0800, Xie Yongji wrote:
> > > > > > > > > This series introduces a framework that makes it possible to implement
> > > > > > > > > software-emulated vDPA devices in userspace. And to make the device
> > > > > > > > > emulation more secure, the emulated vDPA device's control path is handled
> > > > > > > > > in the kernel and only the data path is implemented in the userspace.
> > > > > > > > >
> > > > > > > > > Since the emuldated vDPA device's control path is handled in the kernel,
> > > > > > > > > a message mechnism is introduced to make userspace be aware of the data
> > > > > > > > > path related changes. Userspace can use read()/write() to receive/reply
> > > > > > > > > the control messages.
> > > > > > > > >
> > > > > > > > > In the data path, the core is mapping dma buffer into VDUSE daemon's
> > > > > > > > > address space, which can be implemented in different ways depending on
> > > > > > > > > the vdpa bus to which the vDPA device is attached.
> > > > > > > > >
> > > > > > > > > In virtio-vdpa case, we implements a MMU-based software IOTLB with
> > > > > > > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
> > > > > > > > > buffer is reside in a userspace memory region which can be shared to the
> > > > > > > > > VDUSE userspace processs via transferring the shmfd.
> > > > > > > > >
> > > > > > > > > The details and our user case is shown below:
> > > > > > > > >
> > > > > > > > > ------------------------ ------------------------- ----------------------------------------------
> > > > > > > > > | Container | | QEMU(VM) | | VDUSE daemon |
> > > > > > > > > | --------- | | ------------------- | | ------------------------- ---------------- |
> > > > > > > > > | |dev/vdx| | | |/dev/vhost-vdpa-x| | | | vDPA device emulation | | block driver | |
> > > > > > > > > ------------+----------- -----------+------------ -------------+----------------------+---------
> > > > > > > > > | | | |
> > > > > > > > > | | | |
> > > > > > > > > ------------+---------------------------+----------------------------+----------------------+---------
> > > > > > > > > | | block device | | vhost device | | vduse driver | | TCP/IP | |
> > > > > > > > > | -------+-------- --------+-------- -------+-------- -----+---- |
> > > > > > > > > | | | | | |
> > > > > > > > > | ----------+---------- ----------+----------- -------+------- | |
> > > > > > > > > | | virtio-blk driver | | vhost-vdpa driver | | vdpa device | | |
> > > > > > > > > | ----------+---------- ----------+----------- -------+------- | |
> > > > > > > > > | | virtio bus | | | |
> > > > > > > > > | --------+----+----------- | | | |
> > > > > > > > > | | | | | |
> > > > > > > > > | ----------+---------- | | | |
> > > > > > > > > | | virtio-blk device | | | | |
> > > > > > > > > | ----------+---------- | | | |
> > > > > > > > > | | | | | |
> > > > > > > > > | -----------+----------- | | | |
> > > > > > > > > | | virtio-vdpa driver | | | | |
> > > > > > > > > | -----------+----------- | | | |
> > > > > > > > > | | | | vdpa bus | |
> > > > > > > > > | -----------+----------------------+---------------------------+------------ | |
> > > > > > > > > | ---+--- |
> > > > > > > > > -----------------------------------------------------------------------------------------| NIC |------
> > > > > > > > > ---+---
> > > > > > > > > |
> > > > > > > > > ---------+---------
> > > > > > > > > | Remote Storages |
> > > > > > > > > -------------------
> > > > > > > > >
> > > > > > > > > We make use of it to implement a block device connecting to
> > > > > > > > > our distributed storage, which can be used both in containers and
> > > > > > > > > VMs. Thus, we can have an unified technology stack in this two cases.
> > > > > > > > >
> > > > > > > > > To test it with null-blk:
> > > > > > > > >
> > > > > > > > > $ qemu-storage-daemon \
> > > > > > > > > --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
> > > > > > > > > --monitor chardev=charmonitor \
> > > > > > > > > --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \
> > > > > > > > > --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128
> > > > > > > > >
> > > > > > > > > The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse
> > > > > > > >
> > > > > > > > It's been half a year - any plans to upstream this?
> > > > > > >
> > > > > > > Yeah, this is on my to-do list this month.
> > > > > > >
> > > > > > > Sorry for taking so long... I've been working on another project
> > > > > > > enabling userspace RDMA with VDUSE for the past few months. So I
> > > > > > > didn't have much time for this. Anyway, I will submit the first
> > > > > > > version as soon as possible.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yongji
> > > > > >
> > > > > > Oh fun. You mean like virtio-rdma? Or RDMA as a backend for regular
> > > > > > virtio?
> > > > > >
> > > > >
> > > > > Yes, like virtio-rdma. Then we can develop something like userspace
> > > > > rxe、siw or custom protocol with VDUSE.
> > > > >
> > > > > Thanks,
> > > > > Yongji
> > > >
> > > > Would be interesting to see the spec for that.
> > >
> > > Will send it ASAP.
> > >
> > > > The issues with RDMA revolved around the fact that current
> > > > apps tend to either use non-standard propocols for connection
> > > > establishment or use UD where there's IIRC no standard
> > > > at all. So QP numbers are hard to virtualize.
> > > > Similarly many use LIDs directly with the same effect.
> > > > GUIDs might be virtualizeable but no one went to the effort.
> > > >
> > >
> > > Actually we aimed at emulating a soft RDMA with normal NIC (not use
> > > RDMA capability) rather than virtualizing a physical RDMA NIC into
> > > several vRDMA devices. If so, I think we won't have those issues,
> > > right?
> >
> > Right, maybe you won't.
> >
> > > > To say nothing about the interaction with memory overcommit.
> > > >
> > >
> > > I don't get you here. Could you give me more details?
> > >
> > > Thanks,
> > > Yongji
> >
> > RDMA devices tend to want to pin the memory under DMA.
> >
>
> I see. Maybe something like dm or odp could be helpful.
>
> Thanks,
> Yongji
Yes sure.
--
MST
Powered by blists - more mailing lists