[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB18862452FD4172DCA70C89B88C569@MWHPR11MB1886.namprd11.prod.outlook.com>
Date: Sat, 8 May 2021 07:31:18 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: Jason Gunthorpe <jgg@...dia.com>, "Liu, Yi L" <yi.l.liu@...el.com>,
"Jacob Pan" <jacob.jun.pan@...ux.intel.com>,
Auger Eric <eric.auger@...hat.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
LKML <linux-kernel@...r.kernel.org>,
Joerg Roedel <joro@...tes.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
David Woodhouse <dwmw2@...radead.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
Johannes Weiner <hannes@...xchg.org>,
Jean-Philippe Brucker <jean-philippe@...aro.com>,
Jonathan Corbet <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>
Subject: RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation
APIs
> From: Alex Williamson <alex.williamson@...hat.com>
> Sent: Saturday, May 8, 2021 1:06 AM
>
> > > Those are the main ones I can think of. It is nice to have a simple
> > > map/unmap interface, I'd hope that a new /dev/ioasid interface wouldn't
> > > raise the barrier to entry too high, but the user needs to have the
> > > ability to have more control of their mappings and locked page
> > > accounting should probably be offloaded somewhere. Thanks,
> > >
> >
> > Based on your feedbacks I feel it's probably reasonable to start with
> > a type1v2 semantics for the new interface. Locked accounting could
> > also start with the same VFIO restriction and then improve it
> > incrementally, if a cleaner way is intrusive (if not affecting uAPI).
> > But I didn't get the suggestion on "more control of their mappings".
> > Can you elaborate?
>
> Things like I note above, userspace cannot currently specify mapping
> granularity nor has any visibility to the granularity they get from the
> IOMMU. What actually happens in the IOMMU is pretty opaque to the user
> currently. Thanks,
>
It's much clearer. Based on all the discussions so far I'm thinking about
a staging approach when building the new interface, basically following
the model that Jason pointed out - generic stuff first, then platform
specific extension:
Phase 1: /dev/ioasid with core ingredients and vfio type1v2 semantics
- ioasid is the software handle representing an I/O page table
- uAPI accepts a type1v2 map/unmap semantics per ioasid
- helpers for VFIO/VDPA to bind ioasid_fd and attach ioasids
- multiple ioasids are allowed without nesting (vIOMMU, or devices
w/ incompatible iommu attributes)
- an ioasid disallows any operation before it's attached to a device
- an ioasid inherits iommu attributes from the 1st device attached
to it
- userspace is expected to manage hardware restrictions and the
kernel only returns error when restrictions are broken
* map/unmap on an ioasid will fail before every device in a group
is attached to it
* ioasid attach will fail if the new device has incompatibile iommu
attribute as that of this ioasid
- thus no group semantics in uAPI
- no change to vfio container/group/type1 logic, for running existing
vfio applications
* imply some duplication between vfio type1 and ioasid for some time
- new uAPI in vfio to allow explicit opening of a device and then binding
it to the ioasid_fd
* possibly require each device exposed in /dev/vfio/
- support both pdev and mdev
Phase 2: ioasid nesting
- Allow bind/unbind_pgtable semantics per ioasid
- Allow ioasid nesting
* HW ioasid nesting if supported by platform
* otherwise fall back to SW ioasid nesting (in-kernel shadowing)
- iotlb invalidation per ioasid
- I/O page fault handling per ioasid
- hw_id is not exposed in uAPI. Vendor IOMMU driver decides
when/how hw_id is allocated and programmed properly
Phase3: optimizations and vendor extensions (order undefined, up to
the specific feature owner):
- (Intel) ENQCMD support with hw_id exposure in uAPI
- (ARM/AMD) RID-based pasid table assignment
- (PPC) window-based iova management
- Optimizations:
* replace vfio type1 with a shim driver to use ioasid backend
* mapping granularity
* HW dirty page tracking
* ...
Does above sounds a sensible plan? If yes we'll start working on
phase1 then...
Thanks
Kevin
Powered by blists - more mailing lists