[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB1886172080807517E92A8EF68C3D9@MWHPR11MB1886.namprd11.prod.outlook.com>
Date: Wed, 2 Jun 2021 01:25:00 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: LKML <linux-kernel@...r.kernel.org>,
Joerg Roedel <joro@...tes.org>,
"Lu Baolu" <baolu.lu@...ux.intel.com>,
David Woodhouse <dwmw2@...radead.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Alex Williamson (alex.williamson@...hat.com)"
<alex.williamson@...hat.com>, Jason Wang <jasowang@...hat.com>,
Eric Auger <eric.auger@...hat.com>,
Jonathan Corbet <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>,
"Liu, Yi L" <yi.l.liu@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
Jacob Pan <jacob.jun.pan@...ux.intel.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
David Gibson <david@...son.dropbear.id.au>,
Kirti Wankhede <kwankhede@...dia.com>,
"Robin Murphy" <robin.murphy@....com>
Subject: RE: [RFC] /dev/ioasid uAPI proposal
> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Wednesday, June 2, 2021 4:29 AM
>
> On Tue, Jun 01, 2021 at 07:01:57AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@...dia.com>
> > > Sent: Saturday, May 29, 2021 4:03 AM
> > >
> > > On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote:
> > > > /dev/ioasid provides an unified interface for managing I/O page tables
> for
> > > > devices assigned to userspace. Device passthrough frameworks (VFIO,
> > > vDPA,
> > > > etc.) are expected to use this interface instead of creating their own
> logic to
> > > > isolate untrusted device DMAs initiated by userspace.
> > >
> > > It is very long, but I think this has turned out quite well. It
> > > certainly matches the basic sketch I had in my head when we were
> > > talking about how to create vDPA devices a few years ago.
> > >
> > > When you get down to the operations they all seem pretty common
> sense
> > > and straightfoward. Create an IOASID. Connect to a device. Fill the
> > > IOASID with pages somehow. Worry about PASID labeling.
> > >
> > > It really is critical to get all the vendor IOMMU people to go over it
> > > and see how their HW features map into this.
> > >
> >
> > Agree. btw I feel it might be good to have several design opens
> > centrally discussed after going through all the comments. Otherwise
> > they may be buried in different sub-threads and potentially with
> > insufficient care (especially for people who haven't completed the
> > reading).
> >
> > I summarized five opens here, about:
> >
> > 1) Finalizing the name to replace /dev/ioasid;
> > 2) Whether one device is allowed to bind to multiple IOASID fd's;
> > 3) Carry device information in invalidation/fault reporting uAPI;
> > 4) What should/could be specified when allocating an IOASID;
> > 5) The protocol between vfio group and kvm;
> >
> > For 1), two alternative names are mentioned: /dev/iommu and
> > /dev/ioas. I don't have a strong preference and would like to hear
> > votes from all stakeholders. /dev/iommu is slightly better imho for
> > two reasons. First, per AMD's presentation in last KVM forum they
> > implement vIOMMU in hardware thus need to support user-managed
> > domains. An iommu uAPI notation might make more sense moving
> > forward. Second, it makes later uAPI naming easier as 'IOASID' can
> > be always put as an object, e.g. IOMMU_ALLOC_IOASID instead of
> > IOASID_ALLOC_IOASID. :)
>
> I think two years ago I suggested /dev/iommu and it didn't go very far
> at the time. We've also talked about this as /dev/sva for a while and
> now /dev/ioasid
>
> I think /dev/iommu is fine, and call the things inside them IOAS
> objects.
>
> Then we don't have naming aliasing with kernel constructs.
>
> > For 2), Jason prefers to not blocking it if no kernel design reason. If
> > one device is allowed to bind multiple IOASID fd's, the main problem
> > is about cross-fd IOASID nesting, e.g. having gpa_ioasid created in fd1
> > and giova_ioasid created in fd2 and then nesting them together (and
>
> Huh? This can't happen
>
> Creating an IOASID is an operation on on the /dev/ioasid FD. We won't
> provide APIs to create a tree of IOASID's outside a single FD container.
>
> If a device can consume multiple IOASID's it doesn't care how many or
> what /dev/ioasid FDs they come from.
OK, this implies that if one user inadvertently creates intended parent/
child via different fd's then the operation will simply fail. More specifically
taking ARM's case for example. There is only a single 2nd-level I/O page
table per device (nested by multiple 1st-level tables). Say the user already
creates a gpa_ioasid for a device via fd1. Now he binds the device to fd2,
intending to enable vSVA which requires nested translation thus needs
create a parent via fd2. This parent creation will simply fail by the IOMMU
layer because the 2nd-level (via fd1) is already installed for this device.
>
> > To the other end there was also thought whether we should make
> > a single I/O address space per IOASID fd. This was discussed in previous
> > thread that #fd's are insufficient to afford theoretical 1M's address
> > spaces per device. But let's have another revisit and draw a clear
> > conclusion whether this option is viable.
>
> I had remarks on this, I think per-fd doesn't work
>
> > This implies that VFIO_BOUND_IOASID will be extended to allow user
> > specify a device label. This label will be recorded in /dev/iommu to
> > serve per-device invalidation request from and report per-device
> > fault data to the user.
>
> I wonder which of the user providing a 64 bit cookie or the kernel
> returning a small IDA is the best choice here? Both have merits
> depending on what qemu needs..
Yes, either way can work. I don't have a strong preference. Jean?
>
> > In addition, vPASID (if provided by user) will
> > be also recorded in /dev/iommu so vPASID<->pPASID conversion
> > is conducted properly. e.g. invalidation request from user carries
> > a vPASID which must be converted into pPASID before calling iommu
> > driver. Vice versa for raw fault data which carries pPASID while the
> > user expects a vPASID.
>
> I don't think the PASID should be returned at all. It should return
> the IOASID number in the FD and/or a u64 cookie associated with that
> IOASID. Userspace should figure out what the IOASID & device
> combination means.
This is true for Intel. But what about ARM which has only one IOASID
(pasid table) per device to represent all guest I/O page tables?
>
> > Seems to close this design open we have to touch the kAPI design. and
> > Joerg's input is highly appreciated here.
>
> uAPI is forever, the kAPI is constantly changing. I always dislike
> warping the uAPI based on the current kAPI situation.
>
I got this point. My point was that I didn't see significant gain from either
option thus to better compare the two uAPI options we might want to
further consider the involved kAPI effort as another factor.
Thanks
Kevin
Powered by blists - more mailing lists