[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210923114219.GG964074@nvidia.com>
Date: Thu, 23 Sep 2021 08:42:19 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: "Tian, Kevin" <kevin.tian@...el.com>
Cc: Alex Williamson <alex.williamson@...hat.com>,
"Liu, Yi L" <yi.l.liu@...el.com>, "hch@....de" <hch@....de>,
"jasowang@...hat.com" <jasowang@...hat.com>,
"joro@...tes.org" <joro@...tes.org>,
"jean-philippe@...aro.org" <jean-philippe@...aro.org>,
"parav@...lanox.com" <parav@...lanox.com>,
"lkml@...ux.net" <lkml@...ux.net>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"lushenming@...wei.com" <lushenming@...wei.com>,
"eric.auger@...hat.com" <eric.auger@...hat.com>,
"corbet@....net" <corbet@....net>,
"Raj, Ashok" <ashok.raj@...el.com>,
"yi.l.liu@...ux.intel.com" <yi.l.liu@...ux.intel.com>,
"Tian, Jun J" <jun.j.tian@...el.com>, "Wu, Hao" <hao.wu@...el.com>,
"Jiang, Dave" <dave.jiang@...el.com>,
"jacob.jun.pan@...ux.intel.com" <jacob.jun.pan@...ux.intel.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"robin.murphy@....com" <robin.murphy@....com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"dwmw2@...radead.org" <dwmw2@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
"david@...son.dropbear.id.au" <david@...son.dropbear.id.au>,
"nicolinc@...dia.com" <nicolinc@...dia.com>
Subject: Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO
On Thu, Sep 23, 2021 at 03:38:10AM +0000, Tian, Kevin wrote:
> > From: Tian, Kevin
> > Sent: Thursday, September 23, 2021 11:11 AM
> >
> > >
> > > The required behavior for iommufd is to have the IOMMU ignore the
> > > no-snoop bit so that Intel HW can disable wbinvd. This bit should be
> > > clearly documented for its exact purpose and if other arches also have
> > > instructions that need to be disabled if snoop TLPs are allowed then
> > > they can re-use this bit. It appears ARM does not have this issue and
> > > does not need the bit.
> >
> > Disabling wbinvd is one purpose. imo the more important intention
> > is that iommu vendor uses different PTE formats between snoop and
> > !snoop. As long as we want allow userspace to opt in case of isoch
> > performance requirement (unlike current vfio which always choose
> > snoop format if available), such mechanism is required for all vendors.
> >
>
> btw I'm not sure whether the wbinvd trick is Intel specific. All other
> platforms (amd, arm, s390, etc.) currently always claim OMMU_CAP_
> CACHE_COHERENCY (the source of IOMMU_CACHE).
This only means they don't need to use the arch cache flush
helpers. It has nothing to do with no-snoop on those platforms.
> They didn't hit this problem because vfio always sets IOMMU_CACHE to
> force every DMA to snoop. Will they need to handle similar
> wbinvd-like trick (plus necessary memory type virtualization) when
> non-snoop format is enabled? Or are their architectures highly
> optimized to afford isoch traffic even with snoop (then fine to not
> support user opt-in)?
In other arches the question is:
- Do they allow non-coherent DMA to exist in a VM?
- Can the VM issue cache maintaince ops to fix the decoherence?
The Intel functional issue is that Intel blocks the cache maintaince
ops from the VM and the VM has no way to self-discover that the cache
maintaince ops don't work.
Other arches don't seem to have this specific problem...
The other warped part of this is that Linux doesn't actually support
no-snoop DMA through the DMA API. The users in Intel GPU drivers are
all hacking it up, so it may well be that on other arches Linux never
ask devices to issue no-snoop DMA because there is no portable way
for the driver to restore coherence on a DMA by DMA basis..
Jason
Powered by blists - more mailing lists