[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5dcef4ab-feb5-d116-b2a9-50608784a054@redhat.com>
Date: Thu, 21 Nov 2019 15:21:29 +0800
From: Jason Wang <jasowang@...hat.com>
To: Jason Gunthorpe <jgg@...pe.ca>,
Alex Williamson <alex.williamson@...hat.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>,
Parav Pandit <parav@...lanox.com>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
davem@...emloft.net, gregkh@...uxfoundation.org,
Dave Ertman <david.m.ertman@...el.com>, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, nhorman@...hat.com,
sassmann@...hat.com, Kiran Patil <kiran.patil@...el.com>,
Tiwei Bie <tiwei.bie@...el.com>
Subject: Re: [net-next v2 1/1] virtual-bus: Implementation of Virtual Bus
On 2019/11/21 上午11:03, Jason Gunthorpe wrote:
> On Wed, Nov 20, 2019 at 03:07:32PM -0700, Alex Williamson wrote:
>
>>> On Wed, Nov 20, 2019 at 10:28:56AM -0700, Alex Williamson wrote:
>>>>>> Are you objecting the mdev_set_iommu_deivce() stuffs here?
>>>>> I'm questioning if it fits the vfio PCI device security model, yes.
>>>> The mdev IOMMU backing device model is for when an mdev device has
>>>> IOMMU based isolation, either via the PCI requester ID or via requester
>>>> ID + PASID. For example, an SR-IOV VF may be used by a vendor to
>>>> provide IOMMU based translation and isolation, but the VF may not be
>>>> complete otherwise to provide a self contained device. It might
>>>> require explicit coordination and interaction with the PF driver, ie.
>>>> mediation.
>>> In this case the PF does not look to be involved, the ICF kernel
>>> driver is only manipulating registers in the same VF that the vfio
>>> owns the IOMMU for.
>> The mdev_set_iommu_device() call is probably getting caught up in the
>> confusion of mdev as it exists today being vfio specific. What I
>> described in my reply is vfio specific. The vfio iommu backend is
>> currently the only code that calls mdev_get_iommu_device(), JasonW
>> doesn't use it in the virtio-mdev code, so this seems like a stray vfio
>> specific interface that's setup by IFC but never used.
> I couldn't really say, it was the only thing I noticed in IFC that
> seemed to have anything to do with identifying what IOMMU group to use
> for the vfio interface..
>
>>> This is why I keep calling it a "so-called mediated device" because it
>>> is absolutely not clear what the kernel driver is mediating. Nearly
>>> all its work is providing a subsystem-style IOCTL interface under the
>>> existing vfio multiplexer unrelated to vfio requirements for DMA.
>> Names don't always evolve well to what an interface becomes, see for
>> example vfio. However, even in the vfio sense of mediated devices we
>> have protocol translation. The mdev vendor driver translates vfio API
>> callbacks into hardware specific interactions. Is this really much
>> different?
> I think the name was fine if you constrain 'mediated' to mean
> 'mediated IOMMU'
But actually it does much more than just IOMMU.
>
> Broading to be basically any driver interface is starting to overlap
> with the role of the driver core and subsystems in Linux.
>
>>> However, to me it feels wrong that just because a driver wishes to use
>>> PASID or IOMMU features it should go through vfio and mediated
>>> devices.
>> I don't think I said this. IOMMU backing of an mdev is an acceleration
>> feature as far as vfio-mdev is concerned. There are clearly other ways
>> to use the IOMMU.
> Sorry, I didn't mean to imply you said this, I was mearly reflecting
> on the mission creep comment below. Often in private converstations
> the use of mdev has been justified by 'because it uses IOMMU'
>
>>> I feel like mdev is suffering from mission creep. I see people
>>> proposing to use mdev for many wild things, the Mellanox SF stuff in
>>> the other thread and this 'virtio subsystem' being the two that have
>>> come up publicly this month.
>> Tell me about it... ;)
>>
>>> Putting some boundaries on mdev usage would really help people know
>>> when to use it. My top two from this discussion would be:
>>>
>>> - mdev devices should only bind to vfio. It is not a general kernel
>>> driver matcher mechanism. It is not 'virtual-bus'.
>> I think this requires the driver-core knowledge to really appreciate.
>> Otherwise there's apparently a common need to create sub-devices and
>> without closer inspection of the bus:driver API contract, it's too easy
>> to try to abstract the device:driver API via the bus. mdev already has
>> a notion that the device itself can use any API, but the interface to
>> the bus is the vendor provided, vfio compatible callbacks.
> But now that we are talking about this, I think there is a pretty
> clear opinion forming that if you want to do kernel-kernel drivers
> that is 'virtual bus' as proposed in this threads patch, not mdev.
This looks confused.
1) Virtual bus allows multiple different type of devices to be attached
on a single bus, isn't this where you show your concern when I do
similar thing for a single mdev-bus?
2) Virtual bus hide the communication through a void *, this is not
impossible for mdev.
3) After decoupling vfio out of mdev, there's no fundamental difference
between mdev and virtual bus except that mdev is coupled with sysfs
interface and can talk to VFIO. And that's really what we want for
virtio, not only for having a management interface but also for a
unified framework/API between vhost(userspace) and virtio(kernel) driver.
4) In the cover letter of virtual-bus it said:
"
+One use case example is an rdma driver needing to connect with several
+different types of PCI LAN devices to be able to request resources from
+them (queue sets). Each LAN driver that supports rdma will register a
+virtbus_device on the virtual bus for each physical function. The rdma
+driver will register as a virtbus_driver on the virtual bus to be
+matched up with multiple virtbus_devices and receive a pointer to a
+struct containing the callbacks that the PCI LAN drivers support for
+registering with them.
"
It did something like device aggregation. Ok, you might think it could
be extended. But why mdev can't be extended?
>
> Adding that knowledge to the mdev documentation would probably help
> future people.
>
>>> - mdev & vfio are not a substitute for a proper kernel subsystem. We
>>> shouldn't export a complex subsystem-like ioctl API through
>>> vfio ioctl extensions. Make a proper subsystem, it is not so hard.
>> This is not as clear to me, is "ioctl" used once or twice too often or
>> are you describing a defined structure of callbacks as an ioctl API?
>> The vfio mdev interface is just an extension of the file descriptor
>> based vfio device API. The device needs to handle actual ioctls, but
>> JasonW's virtio-mdev series had their own set of callbacks. Maybe a
>> concrete example of this item would be helpful. Thanks,
> I did not intend it to be a clear opinion, more of a vauge guide for
> documentation. I think as a maintainer you will be asked to make this
> call.
>
> The role of a subsystem in Linux is traditionally to take many
> different kinds of HW devices and bring them to a common programming
> API. Provide management and diagnostics, and expose some user ABI to
> access the HW.
>
> The role of vfio has traditionally been around secure device
> assignment of a HW resource to a VM. I'm not totally clear on what the
> role if mdev is seen to be, but all the mdev drivers in the tree seem
> to make 'and pass it to KVM' a big part of their description.
>
> So, looking at the virtio patches, I see some intended use is to map
> some BAR pages into the VM.
Nope, at least not for the current stage. It still depends on the
virtio-net-pci emulatio in qemu to work. In the future, we will allow
such mapping only for dorbell.
> I see an ops struct to take different
> kinds of HW devices to a common internal kernel API. I understand a
> desire to bind kernel drivers that are not vfio to those ops, and I
> see a user ioctl ABI based around those ops.
>
> I also understand the BAR map is not registers, but just a write-only
> doorbell page. So I suppose any interaction the guest will have with
> the device prior to starting DMA is going to be software emulated in
> qemu, and relayed into ioctls. (?) ie this is no longer strictly
> "device assignment" but "accelerated device emulation".
>
> Is virtio more vfio or more subsystem? The biggest thing that points
> toward vfio is the intended use. The other items push away.
>
> Frankly, when I look at what this virtio stuff is doing I see RDMA:
> - Both have a secure BAR pages for mmaping to userspace (or VM)
> - Both are prevented from interacting with the device at a register
> level and must call to the kernel - ie creating resources is a
> kernel call - for security.
> - Both create command request/response rings in userspace controlled
> memory and have HW DMA to read requests and DMA to generate responses
> - Both allow the work on the rings to DMA outside the ring to
> addresses controlled by userspace.
> - Both have to support a mixture of HW that uses on-device security
> or IOMMU based security.
>
> (I actually gave a talk on how alot of modern HW is following the RDMA
> design patterns at plumbers, maybe video will come out soon)
>
> We've had the same debate with RDMA. Like VFIO it has an extensible
> file descriptor with a driver-specific path that can serve an
> unlimited range of uses. We have had to come up with some sensability
> and definition for "what is RDMA" and is appropriate for the FD.
If you looking at my V13, after decoupling, you can register your own
vhost driver on the mdev_virito bus with your own API if you don't like
VFIO.
Thanks
>
> Jason
>
Powered by blists - more mailing lists