[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB527679A12CF8298840D12B488CE79@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Wed, 6 Apr 2022 06:52:04 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>,
Alex Williamson <alex.williamson@...hat.com>,
Lu Baolu <baolu.lu@...ux.intel.com>,
"Christian Benvenuti" <benve@...co.com>,
Cornelia Huck <cohuck@...hat.com>,
"David Woodhouse" <dwmw2@...radead.org>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
Jason Wang <jasowang@...hat.com>,
"Joerg Roedel" <joro@...tes.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-arm-msm@...r.kernel.org" <linux-arm-msm@...r.kernel.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
Matthew Rosato <mjrosato@...ux.ibm.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Nelson Escobar <neescoba@...co.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Rob Clark <robdclark@...il.com>,
Robin Murphy <robin.murphy@....com>,
"Suravee Suthikulpanit" <suravee.suthikulpanit@....com>,
"virtualization@...ts.linux-foundation.org"
<virtualization@...ts.linux-foundation.org>,
Will Deacon <will@...nel.org>
CC: Christoph Hellwig <hch@....de>
Subject: RE: [PATCH 0/5] Make the iommu driver no-snoop block feature
consistent
> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Wednesday, April 6, 2022 12:16 AM
>
> PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented
> by a platform as bypassing elements in the DMA coherent CPU cache
> hierarchy. A driver can command a device to set this bit on some of its
> transactions as a micro-optimization.
>
> However, the driver is now responsible to synchronize the CPU cache with
> the DMA that bypassed it. On x86 this is done through the wbinvd
> instruction, and the i915 GPU driver is the only Linux DMA driver that
> calls it.
More accurately x86 supports both unprivileged clflush instructions
to invalidate one cacheline and a privileged wbinvd instruction to
invalidate the entire cache. Replacing 'this is done' with 'this may
be done' is clearer.
>
> The problem comes that KVM on x86 will normally disable the wbinvd
> instruction in the guest and render it a NOP. As the driver running in the
> guest is not aware the wbinvd doesn't work it may still cause the device
> to set the no-snoop bit and the platform will bypass the CPU cache.
> Without a working wbinvd there is no way to re-synchronize the CPU cache
> and the driver in the VM has data corruption.
>
> Thus, we see a general direction on x86 that the IOMMU HW is able to block
> the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to
> to NOP the wbinvd without causing any data corruption.
>
> This control for Intel IOMMU was exposed by using IOMMU_CACHE and
> IOMMU_CAP_CACHE_COHERENCY, however these two values now have
> multiple
> meanings and usages beyond blocking no-snoop and the whole thing has
> become confused.
Also point out your finding about AMD IOMMU?
>
> Change it so that:
> - IOMMU_CACHE is only about the DMA coherence of normal DMAs from a
> device. It is used by the DMA API and set when the DMA API will not be
> doing manual cache coherency operations.
>
> - dev_is_dma_coherent() indicates if IOMMU_CACHE can be used with the
> device
>
> - The new optional domain op enforce_cache_coherency() will cause the
> entire domain to block no-snoop requests - ie there is no way for any
> device attached to the domain to opt out of the IOMMU_CACHE behavior.
>
> An iommu driver should implement enforce_cache_coherency() so that by
> default domains allow the no-snoop optimization. This leaves it available
> to kernel drivers like i915. VFIO will call enforce_cache_coherency()
> before establishing any mappings and the domain should then permanently
> block no-snoop.
>
> If enforce_cache_coherency() fails VFIO will communicate back through to
> KVM into the arch code via kvm_arch_register_noncoherent_dma()
> (only implemented by x86) which triggers a working wbinvd to be made
> available to the VM.
>
> While other arches are certainly welcome to implement
> enforce_cache_coherency(), it is not clear there is any benefit in doing
> so.
>
> After this series there are only two calls left to iommu_capable() with a
> bus argument which should help Robin's work here.
>
> This is on github:
> https://github.com/jgunthorpe/linux/commits/intel_no_snoop
>
> Cc: "Tian, Kevin" <kevin.tian@...el.com>
> Cc: Robin Murphy <robin.murphy@....com>
> Cc: Alex Williamson <alex.williamson@...hat.com>
> Cc: Christoph Hellwig <hch@....de>
> Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
>
> Jason Gunthorpe (5):
> iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with
> dev_is_dma_coherent()
> vfio: Require that devices support DMA cache coherence
> iommu: Introduce the domain op enforce_cache_coherency()
> vfio: Move the Intel no-snoop control off of IOMMU_CACHE
> iommu: Delete IOMMU_CAP_CACHE_COHERENCY
>
> drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++++------
> drivers/iommu/amd/iommu.c | 9 +++++--
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 --
> drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 -----
> drivers/iommu/arm/arm-smmu/qcom_iommu.c | 6 -----
> drivers/iommu/fsl_pamu_domain.c | 6 -----
> drivers/iommu/intel/iommu.c | 15 ++++++++---
> drivers/iommu/s390-iommu.c | 2 --
> drivers/vfio/vfio.c | 6 +++++
> drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++--------
> drivers/vhost/vdpa.c | 3 ++-
> include/linux/intel-iommu.h | 1 +
> include/linux/iommu.h | 6 +++--
> 13 files changed, 58 insertions(+), 50 deletions(-)
>
>
> base-commit: 3123109284176b1532874591f7c81f3837bbdc17
> --
> 2.35.1
Powered by blists - more mailing lists