linux-kernel - Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <513c62ba-3f44-f4cf-3b3d-e0e03b6a6de1@redhat.com>
Date:   Fri, 5 Jul 2019 08:30:00 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Tiwei Bie <tiwei.bie@...el.com>
Cc:     mst@...hat.com, alex.williamson@...hat.com,
        maxime.coquelin@...hat.com, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, virtualization@...ts.linux-foundation.org,
        netdev@...r.kernel.org, dan.daly@...el.com,
        cunming.liang@...el.com, zhihong.wang@...el.com
Subject: Re: [RFC v2] vhost: introduce mdev based hardware vhost backend


On 2019/7/4 下午3:02, Tiwei Bie wrote:
> On Thu, Jul 04, 2019 at 02:35:20PM +0800, Jason Wang wrote:
>> On 2019/7/4 下午2:21, Tiwei Bie wrote:
>>> On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
>>>> On 2019/7/3 下午9:08, Tiwei Bie wrote:
>>>>> On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
>>>>>> On 2019/7/3 下午7:52, Tiwei Bie wrote:
>>>>>>> On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
>>>>>>>> On 2019/7/3 下午5:13, Tiwei Bie wrote:
>>>>>>>>> Details about this can be found here:
>>>>>>>>>
>>>>>>>>> https://lwn.net/Articles/750770/
>>>>>>>>>
>>>>>>>>> What's new in this version
>>>>>>>>> ==========================
>>>>>>>>>
>>>>>>>>> A new VFIO device type is introduced - vfio-vhost. This addressed
>>>>>>>>> some comments from here:https://patchwork.ozlabs.org/cover/984763/
>>>>>>>>>
>>>>>>>>> Below is the updated device interface:
>>>>>>>>>
>>>>>>>>> Currently, there are two regions of this device: 1) CONFIG_REGION
>>>>>>>>> (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
>>>>>>>>> device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
>>>>>>>>> can be used to notify the device.
>>>>>>>>>
>>>>>>>>> 1. CONFIG_REGION
>>>>>>>>>
>>>>>>>>> The region described by CONFIG_REGION is the main control interface.
>>>>>>>>> Messages will be written to or read from this region.
>>>>>>>>>
>>>>>>>>> The message type is determined by the `request` field in message
>>>>>>>>> header. The message size is encoded in the message header too.
>>>>>>>>> The message format looks like this:
>>>>>>>>>
>>>>>>>>> struct vhost_vfio_op {
>>>>>>>>> 	__u64 request;
>>>>>>>>> 	__u32 flags;
>>>>>>>>> 	/* Flag values: */
>>>>>>>>>       #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
>>>>>>>>> 	__u32 size;
>>>>>>>>> 	union {
>>>>>>>>> 		__u64 u64;
>>>>>>>>> 		struct vhost_vring_state state;
>>>>>>>>> 		struct vhost_vring_addr addr;
>>>>>>>>> 	} payload;
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> The existing vhost-kernel ioctl cmds are reused as the message
>>>>>>>>> requests in above structure.
>>>>>>>> Still a comments like V1. What's the advantage of inventing a new protocol?
>>>>>>> I'm trying to make it work in VFIO's way..
>>>>>>>
>>>>>>>> I believe either of the following should be better:
>>>>>>>>
>>>>>>>> - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
>>>>>>>> extend it with e.g notify region. The advantages is that all exist userspace
>>>>>>>> program could be reused without modification (or minimal modification). And
>>>>>>>> vhost API hides lots of details that is not necessary to be understood by
>>>>>>>> application (e.g in the case of container).
>>>>>>> Do you mean reusing vhost's ioctl on VFIO device fd directly,
>>>>>>> or introducing another mdev driver (i.e. vhost_mdev instead of
>>>>>>> using the existing vfio_mdev) for mdev device?
>>>>>> Can we simply add them into ioctl of mdev_parent_ops?
>>>>> Right, either way, these ioctls have to be and just need to be
>>>>> added in the ioctl of the mdev_parent_ops. But another thing we
>>>>> also need to consider is that which file descriptor the userspace
>>>>> will do the ioctl() on. So I'm wondering do you mean let the
>>>>> userspace do the ioctl() on the VFIO device fd of the mdev
>>>>> device?
>>>>>
>>>> Yes.
>>> Got it! I'm not sure what's Alex opinion on this. If we all
>>> agree with this, I can do it in this way.
>>>
>>>> Is there any other way btw?
>>> Just a quick thought.. Maybe totally a bad idea.
>>
>> It's not for sure :)
> Thanks!
>
>>
>>>    I was thinking
>>> whether it would be odd to do non-VFIO's ioctls on VFIO's device
>>> fd. So I was wondering whether it's possible to allow binding
>>> another mdev driver (e.g. vhost_mdev) to the supported mdev
>>> devices. The new mdev driver, vhost_mdev, can provide similar
>>> ways to let userspace open the mdev device and do the vhost ioctls
>>> on it. To distinguish with the vfio_mdev compatible mdev devices,
>>> the device API of the new vhost_mdev compatible mdev devices
>>> might be e.g. "vhost-net" for net?
>>>
>>> So in VFIO case, the device will be for passthru directly. And
>>> in VHOST case, the device can be used to accelerate the existing
>>> virtualized devices.
>>>
>>> How do you think?
>>
>> If my understanding is correct, there will be no VFIO ioctl if we go for
>> vhost_mdev?
> Yeah, exactly. If we go for vhost_mdev, we may have some vhost nodes
> in /dev similar to what /dev/vfio/* does to handle the $UUID and open
> the device (e.g. similar to VFIO_GROUP_GET_DEVICE_FD in VFIO). And
> to setup the device, we can try to reuse the ioctls of the existing
> kernel vhost as much as possible.


Interesting, actually, I've considered something similar. I think there 
should be no issues other than DMA:

- Need to invent new API for DMA mapping other than SET_MEM_TABLE? 
(Which is too heavyweight).

- Need to consider a way to co-work with both on chip IOMMU (your 
proposal should be fine) and scalable IOV.

Thanks


>
> Thanks,
> Tiwei
>
>> Thanks
>>
>>
>>> Thanks,
>>> Tiwei
>>>> Thanks
>>>>