[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR11MB18862BF4EA4DC0CFDE6CD2238C769@MWHPR11MB1886.namprd11.prod.outlook.com>
Date: Tue, 6 Apr 2021 00:37:35 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: "Liu, Yi L" <yi.l.liu@...el.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Jacob Pan <jacob.jun.pan@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>,
Joerg Roedel <joro@...tes.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
David Woodhouse <dwmw2@...radead.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
Johannes Weiner <hannes@...xchg.org>,
Jean-Philippe Brucker <jean-philippe@...aro.com>,
"Alex Williamson" <alex.williamson@...hat.com>,
Eric Auger <eric.auger@...hat.com>,
Jonathan Corbet <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>
Subject: RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation
APIs
> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Tuesday, April 6, 2021 7:35 AM
>
> On Fri, Apr 02, 2021 at 07:30:23AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@...dia.com>
> > > Sent: Friday, April 2, 2021 12:04 AM
> > >
> > > On Thu, Apr 01, 2021 at 02:08:17PM +0000, Liu, Yi L wrote:
> > >
> > > > DMA page faults are delivered to root-complex via page request
> message
> > > and
> > > > it is per-device according to PCIe spec. Page request handling flow is:
> > > >
> > > > 1) iommu driver receives a page request from device
> > > > 2) iommu driver parses the page request message. Get the RID,PASID,
> > > faulted
> > > > page and requested permissions etc.
> > > > 3) iommu driver triggers fault handler registered by device driver with
> > > > iommu_report_device_fault()
> > >
> > > This seems confused.
> > >
> > > The PASID should define how to handle the page fault, not the driver.
> > >
> > > I don't remember any device specific actions in ATS, so what is the
> > > driver supposed to do?
> > >
> > > > 4) device driver's fault handler signals an event FD to notify userspace
> to
> > > > fetch the information about the page fault. If it's VM case, inject the
> > > > page fault to VM and let guest to solve it.
> > >
> > > If the PASID is set to 'report page fault to userspace' then some
> > > event should come out of /dev/ioasid, or be reported to a linked
> > > eventfd, or whatever.
> > >
> > > If the PASID is set to 'SVM' then the fault should be passed to
> > > handle_mm_fault
> > >
> > > And so on.
> > >
> > > Userspace chooses what happens based on how they configure the PASID
> > > through /dev/ioasid.
> > >
> > > Why would a device driver get involved here?
> > >
> > > > Eric has sent below series for the page fault reporting for VM with
> passthru
> > > > device.
> > > > https://lore.kernel.org/kvm/20210223210625.604517-5-
> > > eric.auger@...hat.com/
> > >
> > > It certainly should not be in vfio pci. Everything using a PASID needs
> > > this infrastructure, VDPA, mdev, PCI, CXL, etc.
> > >
> >
> > This touches an interesting fact:
> >
> > The fault may be triggered in either 1st-level or 2nd-level page table,
> > when nested translation is enabled (in vSVA case). The 1st-level is bound
> > by the user space, which therefore needs to receive the fault event. The
> > 2nd-level is managed by VFIO (or vDPA), which needs to fix the fault in
> > kernel (e.g. find HVA per faulting GPA, call handle_mm_fault and map
> > GPA->HPA to IOMMU). Yi's current proposal lets VFIO to register the
> > device fault handler, which then forward the event through /dev/ioasid
> > to userspace only if it is a 1st-level fault. Are you suggesting a pgtable-
> > centric fault reporting mechanism to separate handlers in each level,
> > i.e. letting VFIO register handler only for 2nd-level fault and then /dev/
> > ioasid register handler for 1st-level fault?
>
> This I'm struggling to understand. /dev/ioasid should handle all the
> faults cases, why would VFIO ever get involved in a fault? What would
> it even do?
>
> If the fault needs to be fixed in the hypervisor then it is a kernel
> fault and it does handle_mm_fault. This absolutely should not be in
> VFIO or VDPA
With nested translation it is GVA->GPA->HPA. The kernel needs to
fix fault related to GPA->HPA (managed by VFIO/VDPA) while
handle_mm_fault only handles HVA->HPA. In this case, the 2nd-level
page fault is expected to be delivered to VFIO/VDPA first which then
find HVA related to GPA, call handle_mm_fault to fix HVA->HPA,
and then call iommu_map to fix GPA->HPA in the IOMMU page table.
This is exactly like how CPU EPT violation is handled.
>
> If the fault needs to be fixed in the guest, then it needs to be
> delivered over /dev/ioasid in some way and injected into the
> vIOMMU. VFIO and VDPA have nothing to do with vIOMMU driver in quemu.
>
> You need to have an interface under /dev/ioasid to create both page
> table levels and part of that will be to tell the kernel what VA is
> mapped and how to handle faults.
VFIO/VDPA already have their own interface to manage GPA->HPA
mappings. Why do we want to duplicate it in /dev/ioasid?
Thanks
Kevin
Powered by blists - more mailing lists