[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB1886A7E4C6F3E3A81240517B8C759@MWHPR11MB1886.namprd11.prod.outlook.com>
Date: Wed, 7 Apr 2021 08:17:50 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>, Jason Wang <jasowang@...hat.com>
CC: Jean-Philippe Brucker <jean-philippe@...aro.org>,
Alex Williamson <alex.williamson@...hat.com>,
"Raj, Ashok" <ashok.raj@...el.com>,
"Jonathan Corbet" <corbet@....net>,
Jean-Philippe Brucker <jean-philippe@...aro.com>,
LKML <linux-kernel@...r.kernel.org>,
"Jiang, Dave" <dave.jiang@...el.com>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"Li Zefan" <lizefan@...wei.com>,
Johannes Weiner <hannes@...xchg.org>,
Tejun Heo <tj@...nel.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
"Wu, Hao" <hao.wu@...el.com>, David Woodhouse <dwmw2@...radead.org>
Subject: RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation
APIs
> From: Jason Gunthorpe
> Sent: Tuesday, April 6, 2021 8:43 PM
>
> On Tue, Apr 06, 2021 at 09:35:17AM +0800, Jason Wang wrote:
>
> > > VFIO and VDPA has no buisness having map/unmap interfaces once we
> have
> > > /dev/ioasid. That all belongs in the iosaid side.
> > >
> > > I know they have those interfaces today, but that doesn't mean we have
> > > to keep using them for PASID use cases, they should be replaced with a
> > > 'do dma from this pasid on /dev/ioasid' interface certainly not a
> > > 'here is a pasid from /dev/ioasid, go ahead and configure it youself'
> > > interface
> >
> > So it looks like the PASID was bound to SVA in this design. I think it's not
> > necessairly the case:
>
> No, I wish people would stop talking about SVA.
>
> SVA and vSVA are a very special narrow configuration of a PASID. There
> are lots of other PASID configurations! That is the whole point, a
> PASID is complicated, there are many configuration scenarios, they
> need to be in one place with a very clearly defined uAPI
>
I feel it also makes sense to allow a subsystem to specify which configurations
are permitted when allowing a PASID on its device, e.g. excluding things like
GPA mappings that existing subsystems (VFIO/VDPA) already handle well:
- Share GPA mappings between multiple devices (w/ or w/o PASID) for
better IOTLB efficiency;
- Share GPA mappings between transactions w/ PASID and transactions w/o
PASID from the same device (e.g. GPU) for better IOTLB efficiency;
- Use the same page table for GPA mappings before and after the guest
turns on/off the PASID capability;
All above are given as long as we continue to let VFIO/VDPA manage the
iommu domain and associated GPA mappings for PASID. The IOMMU driver
already ensures a nested PASID entry linking to the established GPA paging
structure of the domain when the 1st-level pgtable is bound through
/dev/ioasid.
In contrast, above merits are lost if forcing a model where GPA mappings
for PASID must be constructed through /dev/ioasid, as this will lead to
multiple paging structures for the same GPA mappings implying worse
IOTLB usage and unnecessary cost of invalidations.
Therefore, I envision a scheme where the subsystem could specify
permitted PASID configurations when doing ALLOW_PASID, and then
userspace queries per-PASID capability to learn which operations
are allowed, e.g.:
1) To enable vSVA, VFIO/VDPA allows pgtable binding and related invalidation/
fault ops through /dev/ioasid;
2) for vDPA control vq usage, no configuration is allowed through /dev/ioasid;
3) for new subsystem which doesn't carry any legacy or similar usage as
VFIO/VDPA, it could permit all configurations through /dev/ioasid including
1st-level binding and 2nd-level mapping ops;
This approach also allows us to grow the uAPI in a staging approach. Now
focus on 1) and 2) as VFIO/VDPA are the only two users for now with good
legacy to cover the GPA mappings. More ops can be introduced for 3) when
there is a real example to show what exact ops are required for such a new
subsystem.
Is this a good strategy to move forward?
btw this discussion was raised when discussing the I/O page fault handling
process. Currently the IOMMU layer implements a per-device fault reporting
mechanism, which requires VFIO to register a handler to receive all faults
on its device and then forwards to ioasid if it's due to 1st-level. Possibly it
makes more sense to convert it into a per-pgtable reporting scheme, and
then the owner of each pgtable should register its own handler. It means
for 1) VFIO will register a 2nd-level pgtable handler while /dev/ioasid
will register a 1st-level pgtable handler, while for 3) /dev/ioasid will register
handlers for both 1st-level and 2nd-level pgtable. Jean? also want to know
your thoughts...
Thanks
Kevin
Powered by blists - more mailing lists