[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52765C91893A28A7D21D324E8C6B2@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Mon, 8 Jan 2024 04:07:12 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: "Liu, Yi L" <yi.l.liu@...el.com>, "joro@...tes.org" <joro@...tes.org>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"robin.murphy@....com" <robin.murphy@....com>, "baolu.lu@...ux.intel.com"
<baolu.lu@...ux.intel.com>, "cohuck@...hat.com" <cohuck@...hat.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>, "nicolinc@...dia.com"
<nicolinc@...dia.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"mjrosato@...ux.ibm.com" <mjrosato@...ux.ibm.com>,
"chao.p.peng@...ux.intel.com" <chao.p.peng@...ux.intel.com>,
"yi.y.sun@...ux.intel.com" <yi.y.sun@...ux.intel.com>, "peterx@...hat.com"
<peterx@...hat.com>, "jasowang@...hat.com" <jasowang@...hat.com>,
"shameerali.kolothum.thodi@...wei.com"
<shameerali.kolothum.thodi@...wei.com>, "lulu@...hat.com" <lulu@...hat.com>,
"suravee.suthikulpanit@....com" <suravee.suthikulpanit@....com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>, "Duan,
Zhenzhong" <zhenzhong.duan@...el.com>, "joao.m.martins@...cle.com"
<joao.m.martins@...cle.com>, "Zeng, Xin" <xin.zeng@...el.com>, "Zhao, Yan Y"
<yan.y.zhao@...el.com>
Subject: RE: [PATCH v7 1/3] iommufd: Add data structure for Intel VT-d stage-1
cache invalidation
> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Friday, January 5, 2024 10:45 PM
>
> On Fri, Jan 05, 2024 at 02:52:50AM +0000, Tian, Kevin wrote:
> > > but in reality the relation could be identified in an easy way due to a SIOV
> > > restriction which we discussed before - shared PASID space of PF
> disallows
> > > assigning sibling vdev's to a same VM (otherwise no way to identify which
> > > sibling vdev triggering an iopf when a pasid is used on both vdev's). That
> > > restriction implies that within an iommufd context every iommufd_device
> > > object should contain a unique struct device pointer. So PASID can be
> > > instead ignored in the lookup then just always do iommufd_get_dev_id()
> > > using struct device.
> >
> > A bit more background.
> >
> > Previously we thought this restriction only applies to SIOV+vSVA, as
> > a guest process may bind to both sibling vdev's, leading to the same
> > pasid situation.
> >
> > In concept w/o vSVA it's still possible to assign sibling vdev's to
> > a same VM as each vdev is allocated with a unique pasid to mark vRID
> > so can be differentiated from each other in the fault/error path.
>
> I thought the SIOV plan was that each "vdev" ie vpci function would
> get a slice of the pRID's PASID space statically selected at creation?
>
> So SVA/etc doesn't matter, you reliably get a disjoint set of pRID &
> pPASID into each VM.
>
> From that view you can't identify the iommufd dev_id without knowing
> both the pRID and pPASID which will disambiguate the different SIOV
> iommufd dev_id instances sharing a rid.
true when assigning those instances to different VMs.
Here I was talking about assigning them to a same VM being a problem.
with rid sharing plus same ENQCMD pPASID potentially used on both
instances there'd be ambiguity in vSVA e.g. iopf to identify dev_id.
we agreed before on preventing sibling vdev's in one VM for above
reason, but only as far as vSVA is concerned.
then given the new finding in err reporting I wondered whether this
restriction should be applied to all SIOV scenarios (but irrelevant now
with below explanation after another thinking)
>
> > But when looking at this err code issue with Yi closely, we found
> > there is another gap in the VT-d spec. Upon devtlb invalidation
> > timeout the hw doesn't report pasid in the error info register. this
> > makes it impossible to identify the source vdev if a hwpt invalidation
> > request involves sibling vdev's from a same PF.
>
> Don't you know which command timed out?
unfortunately no.
for errors related to descriptor fetch the driver can tell the command
by looking at the head pointer of the invalidation queue.
command completion is indirectly detected by inserting a wait descriptor
as fence. completion timeout error is reported in an error register. but
this register doesn't record pasid, nor does the command location. if there
are multiple pending devtlb invalidation commands upon timeout
error the spec suggests the driver to treat all of them timeout as the
register can only record one rid.
this is kind of moot. If the driver submits only one command (plus wait)
at a time it doesn't need hw's help to identify the timeout command.
If the driver batches invalidation commands it must treat all timeout if
an timeout error is reported.
from this angle whether to record pasid doesn't really matter.
intel-iommu driver doesn't batch commands. so it's possible for
the driver to figure out the timeout device itself and identify rid plus
pasid to find dev_id from iommufd.
Thanks
Kevin
Powered by blists - more mailing lists