[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191121030357.GB16914@ziepe.ca>
Date: Wed, 20 Nov 2019 23:03:57 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: Jason Wang <jasowang@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Parav Pandit <parav@...lanox.com>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
davem@...emloft.net, gregkh@...uxfoundation.org,
Dave Ertman <david.m.ertman@...el.com>, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, nhorman@...hat.com,
sassmann@...hat.com, Kiran Patil <kiran.patil@...el.com>,
Tiwei Bie <tiwei.bie@...el.com>
Subject: Re: [net-next v2 1/1] virtual-bus: Implementation of Virtual Bus
On Wed, Nov 20, 2019 at 03:07:32PM -0700, Alex Williamson wrote:
> > On Wed, Nov 20, 2019 at 10:28:56AM -0700, Alex Williamson wrote:
> > > > > Are you objecting the mdev_set_iommu_deivce() stuffs here?
> > > >
> > > > I'm questioning if it fits the vfio PCI device security model, yes.
> > >
> > > The mdev IOMMU backing device model is for when an mdev device has
> > > IOMMU based isolation, either via the PCI requester ID or via requester
> > > ID + PASID. For example, an SR-IOV VF may be used by a vendor to
> > > provide IOMMU based translation and isolation, but the VF may not be
> > > complete otherwise to provide a self contained device. It might
> > > require explicit coordination and interaction with the PF driver, ie.
> > > mediation.
> >
> > In this case the PF does not look to be involved, the ICF kernel
> > driver is only manipulating registers in the same VF that the vfio
> > owns the IOMMU for.
>
> The mdev_set_iommu_device() call is probably getting caught up in the
> confusion of mdev as it exists today being vfio specific. What I
> described in my reply is vfio specific. The vfio iommu backend is
> currently the only code that calls mdev_get_iommu_device(), JasonW
> doesn't use it in the virtio-mdev code, so this seems like a stray vfio
> specific interface that's setup by IFC but never used.
I couldn't really say, it was the only thing I noticed in IFC that
seemed to have anything to do with identifying what IOMMU group to use
for the vfio interface..
> > This is why I keep calling it a "so-called mediated device" because it
> > is absolutely not clear what the kernel driver is mediating. Nearly
> > all its work is providing a subsystem-style IOCTL interface under the
> > existing vfio multiplexer unrelated to vfio requirements for DMA.
>
> Names don't always evolve well to what an interface becomes, see for
> example vfio. However, even in the vfio sense of mediated devices we
> have protocol translation. The mdev vendor driver translates vfio API
> callbacks into hardware specific interactions. Is this really much
> different?
I think the name was fine if you constrain 'mediated' to mean
'mediated IOMMU'
Broading to be basically any driver interface is starting to overlap
with the role of the driver core and subsystems in Linux.
> > However, to me it feels wrong that just because a driver wishes to use
> > PASID or IOMMU features it should go through vfio and mediated
> > devices.
>
> I don't think I said this. IOMMU backing of an mdev is an acceleration
> feature as far as vfio-mdev is concerned. There are clearly other ways
> to use the IOMMU.
Sorry, I didn't mean to imply you said this, I was mearly reflecting
on the mission creep comment below. Often in private converstations
the use of mdev has been justified by 'because it uses IOMMU'
> > I feel like mdev is suffering from mission creep. I see people
> > proposing to use mdev for many wild things, the Mellanox SF stuff in
> > the other thread and this 'virtio subsystem' being the two that have
> > come up publicly this month.
>
> Tell me about it... ;)
>
> > Putting some boundaries on mdev usage would really help people know
> > when to use it. My top two from this discussion would be:
> >
> > - mdev devices should only bind to vfio. It is not a general kernel
> > driver matcher mechanism. It is not 'virtual-bus'.
>
> I think this requires the driver-core knowledge to really appreciate.
> Otherwise there's apparently a common need to create sub-devices and
> without closer inspection of the bus:driver API contract, it's too easy
> to try to abstract the device:driver API via the bus. mdev already has
> a notion that the device itself can use any API, but the interface to
> the bus is the vendor provided, vfio compatible callbacks.
But now that we are talking about this, I think there is a pretty
clear opinion forming that if you want to do kernel-kernel drivers
that is 'virtual bus' as proposed in this threads patch, not mdev.
Adding that knowledge to the mdev documentation would probably help
future people.
> > - mdev & vfio are not a substitute for a proper kernel subsystem. We
> > shouldn't export a complex subsystem-like ioctl API through
> > vfio ioctl extensions. Make a proper subsystem, it is not so hard.
>
> This is not as clear to me, is "ioctl" used once or twice too often or
> are you describing a defined structure of callbacks as an ioctl API?
> The vfio mdev interface is just an extension of the file descriptor
> based vfio device API. The device needs to handle actual ioctls, but
> JasonW's virtio-mdev series had their own set of callbacks. Maybe a
> concrete example of this item would be helpful. Thanks,
I did not intend it to be a clear opinion, more of a vauge guide for
documentation. I think as a maintainer you will be asked to make this
call.
The role of a subsystem in Linux is traditionally to take many
different kinds of HW devices and bring them to a common programming
API. Provide management and diagnostics, and expose some user ABI to
access the HW.
The role of vfio has traditionally been around secure device
assignment of a HW resource to a VM. I'm not totally clear on what the
role if mdev is seen to be, but all the mdev drivers in the tree seem
to make 'and pass it to KVM' a big part of their description.
So, looking at the virtio patches, I see some intended use is to map
some BAR pages into the VM. I see an ops struct to take different
kinds of HW devices to a common internal kernel API. I understand a
desire to bind kernel drivers that are not vfio to those ops, and I
see a user ioctl ABI based around those ops.
I also understand the BAR map is not registers, but just a write-only
doorbell page. So I suppose any interaction the guest will have with
the device prior to starting DMA is going to be software emulated in
qemu, and relayed into ioctls. (?) ie this is no longer strictly
"device assignment" but "accelerated device emulation".
Is virtio more vfio or more subsystem? The biggest thing that points
toward vfio is the intended use. The other items push away.
Frankly, when I look at what this virtio stuff is doing I see RDMA:
- Both have a secure BAR pages for mmaping to userspace (or VM)
- Both are prevented from interacting with the device at a register
level and must call to the kernel - ie creating resources is a
kernel call - for security.
- Both create command request/response rings in userspace controlled
memory and have HW DMA to read requests and DMA to generate responses
- Both allow the work on the rings to DMA outside the ring to
addresses controlled by userspace.
- Both have to support a mixture of HW that uses on-device security
or IOMMU based security.
(I actually gave a talk on how alot of modern HW is following the RDMA
design patterns at plumbers, maybe video will come out soon)
We've had the same debate with RDMA. Like VFIO it has an extensible
file descriptor with a driver-specific path that can serve an
unlimited range of uses. We have had to come up with some sensability
and definition for "what is RDMA" and is appropriate for the FD.
Jason
Powered by blists - more mailing lists