[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB5433519229319BA951CA97638CAA9@BN9PR11MB5433.namprd11.prod.outlook.com>
Date: Thu, 30 Sep 2021 09:35:45 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: Alex Williamson <alex.williamson@...hat.com>,
"Liu, Yi L" <yi.l.liu@...el.com>, "hch@....de" <hch@....de>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"joro@...tes.org" <joro@...tes.org>,
"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
"parav@...lanox.com" <parav@...lanox.com>,
"lkml@...ux.net" <lkml@...ux.net>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"lushenming@...wei.com" <lushenming@...wei.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"corbet@....net" <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>,
"yi.l.liu@...ux.intel.com" <yi.l.liu@...ux.intel.com>,
"Tian, Jun J" <jun.j.tian@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
"jacob.jun.pan@...ux.intel.com" <jacob.jun.pan@...ux.intel.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"robin.murphy@....com" <robin.murphy@....com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"dwmw2@...radead.org" <dwmw2@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"david@...son.dropbear.id.au" <david@...son.dropbear.id.au>,
"nicolinc@...dia.com" <nicolinc@...dia.com>
Subject: RE: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO
> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Thursday, September 23, 2021 7:42 PM
>
> On Thu, Sep 23, 2021 at 03:38:10AM +0000, Tian, Kevin wrote:
> > > From: Tian, Kevin
> > > Sent: Thursday, September 23, 2021 11:11 AM
> > >
> > > >
> > > > The required behavior for iommufd is to have the IOMMU ignore the
> > > > no-snoop bit so that Intel HW can disable wbinvd. This bit should be
> > > > clearly documented for its exact purpose and if other arches also have
> > > > instructions that need to be disabled if snoop TLPs are allowed then
> > > > they can re-use this bit. It appears ARM does not have this issue and
> > > > does not need the bit.
> > >
> > > Disabling wbinvd is one purpose. imo the more important intention
> > > is that iommu vendor uses different PTE formats between snoop and
> > > !snoop. As long as we want allow userspace to opt in case of isoch
> > > performance requirement (unlike current vfio which always choose
> > > snoop format if available), such mechanism is required for all vendors.
> > >
> >
> > btw I'm not sure whether the wbinvd trick is Intel specific. All other
> > platforms (amd, arm, s390, etc.) currently always claim OMMU_CAP_
> > CACHE_COHERENCY (the source of IOMMU_CACHE).
>
> This only means they don't need to use the arch cache flush
> helpers. It has nothing to do with no-snoop on those platforms.
>
> > They didn't hit this problem because vfio always sets IOMMU_CACHE to
> > force every DMA to snoop. Will they need to handle similar
> > wbinvd-like trick (plus necessary memory type virtualization) when
> > non-snoop format is enabled? Or are their architectures highly
> > optimized to afford isoch traffic even with snoop (then fine to not
> > support user opt-in)?
>
> In other arches the question is:
> - Do they allow non-coherent DMA to exist in a VM?
And is coherency a static attribute per device or could be opted
by driver on such arch? If the latter, then the same opt path from
userspace sounds also reasonable, since driver is in userspace now.
> - Can the VM issue cache maintaince ops to fix the decoherence?
As you listed the questions are all about non-coherent DMA, not
how non-coherent DMAs are implemented underlyingly. From this
angle focusing on coherent part as Alex suggested is more forward
looking than tying the uAPI to a specific coherency implementation
using snoop?
>
> The Intel functional issue is that Intel blocks the cache maintaince
> ops from the VM and the VM has no way to self-discover that the cache
> maintaince ops don't work.
the VM doesn't need to know whether the maintenance ops
actually works. It just treats the device as if those ops are always
required. The hypervisor will figure out whether those ops should
be blocked based on whether coherency is guaranteed by iommu
based on iommufd/vfio.
>
> Other arches don't seem to have this specific problem...
I think the key is whether other archs allow driver to decide DMA
coherency and indirectly the underlying I/O page table format.
If yes, then I don't see a reason why such decision should not be
given to userspace for passthrough case.
Thanks
Kevin
Powered by blists - more mailing lists