[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52769EEDAE2783426144E2588C662@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Fri, 5 Jan 2024 02:52:50 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>, "Liu, Yi L" <yi.l.liu@...el.com>
CC: "joro@...tes.org" <joro@...tes.org>, "alex.williamson@...hat.com"
<alex.williamson@...hat.com>, "robin.murphy@....com" <robin.murphy@....com>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>, "cohuck@...hat.com"
<cohuck@...hat.com>, "eric.auger@...hat.com" <eric.auger@...hat.com>,
"nicolinc@...dia.com" <nicolinc@...dia.com>, "kvm@...r.kernel.org"
<kvm@...r.kernel.org>, "mjrosato@...ux.ibm.com" <mjrosato@...ux.ibm.com>,
"chao.p.peng@...ux.intel.com" <chao.p.peng@...ux.intel.com>,
"yi.y.sun@...ux.intel.com" <yi.y.sun@...ux.intel.com>, "peterx@...hat.com"
<peterx@...hat.com>, "jasowang@...hat.com" <jasowang@...hat.com>,
"shameerali.kolothum.thodi@...wei.com"
<shameerali.kolothum.thodi@...wei.com>, "lulu@...hat.com" <lulu@...hat.com>,
"suravee.suthikulpanit@....com" <suravee.suthikulpanit@....com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>, "Duan,
Zhenzhong" <zhenzhong.duan@...el.com>, "joao.m.martins@...cle.com"
<joao.m.martins@...cle.com>, "Zeng, Xin" <xin.zeng@...el.com>, "Zhao, Yan Y"
<yan.y.zhao@...el.com>
Subject: RE: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1
cache invalidation
> From: Tian, Kevin
> Sent: Friday, January 5, 2024 10:16 AM
>
> > From: Jason Gunthorpe <jgg@...dia.com>
> > Sent: Thursday, January 4, 2024 10:37 PM
> >
> > On Thu, Dec 14, 2023 at 07:26:39PM +0800, Yi Liu wrote:
> > > Per the prior discussion[1], we agreed to move the error reporting into
> the
> > > driver specific part. On Intel side, we want to report two devTLB
> > > invalidation errors: ICE (invalid completion error) and ITE (invalidation
> > > timeout error). Such errors have an additional SID information to tell
> > > which device failed the devTLB invalidation. I've got the below structure.
> >
> > IMHO all of this complexity is a consequence of the decision to hide
> > the devtlb invalidation from the VM..
> >
> > On the other hand I guess you want to do this because of the SIOV
> > troubles where the vPCI function in the VM is entirely virtual and
> > can't be trivially mapped to a real PCI function for ATC invalidation
> > like ARM and AMD can do (but they also can't support SIOV because of
> > this). :(
> >
> > However it also makes it very confusing about how the VM would
> > perceive an error - eg if it invalidates an SIOV device single PASID
> > and that devtlb fails then the error should be connected back to the
> > vPCI function for the SIOV's specific PASID and not back to the
> > physical PCI function for the SIOV owner.
> >
> > As the iommu driver itself has no idea about the vPCI functions this
> > seems like it is going to get really confusing. The API I suggested in
> > the other email is not entirely going to work as the vPCI function for
> > SIOV cases will have to be identified by the (struct device, PASID) -
> > while it would be easy enough for the iommu driver to provide the
> > PASID, I'm not sure how the iommufd core will relate the PASID back to
> > the iommu_device to understand SIOV without actually being aware of
> > SIOV to some degree :\
>
> we plan to add such awareness with a new binding helper:
>
> https://lore.kernel.org/lkml/20231009085123.463179-4-yi.l.liu@intel.com/
>
> and with metadata to track association between PASID's and iommufd vdev.
>
> but in reality the relation could be identified in an easy way due to a SIOV
> restriction which we discussed before - shared PASID space of PF disallows
> assigning sibling vdev's to a same VM (otherwise no way to identify which
> sibling vdev triggering an iopf when a pasid is used on both vdev's). That
> restriction implies that within an iommufd context every iommufd_device
> object should contain a unique struct device pointer. So PASID can be
> instead ignored in the lookup then just always do iommufd_get_dev_id()
> using struct device.
A bit more background.
Previously we thought this restriction only applies to SIOV+vSVA, as
a guest process may bind to both sibling vdev's, leading to the same
pasid situation.
In concept w/o vSVA it's still possible to assign sibling vdev's to
a same VM as each vdev is allocated with a unique pasid to mark vRID
so can be differentiated from each other in the fault/error path.
But when looking at this err code issue with Yi closely, we found
there is another gap in the VT-d spec. Upon devtlb invalidation
timeout the hw doesn't report pasid in the error info register. this
makes it impossible to identify the source vdev if a hwpt invalidation
request involves sibling vdev's from a same PF.
with that I'm inclined to always imposing this restriction for SIOV.
One may argue that SIOV w/o vSVA w/o devtlb is conceptually immune
but I'm with you that given SIOVr1 is one-off I prefer to limiting its
usability other than complexing the kernel.
Powered by blists - more mailing lists