[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240507061802.20184-1-yan.y.zhao@intel.com>
Date: Tue, 7 May 2024 14:18:02 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: kvm@...r.kernel.org,
linux-kernel@...r.kernel.org,
x86@...nel.org,
alex.williamson@...hat.com,
jgg@...dia.com,
kevin.tian@...el.com
Cc: iommu@...ts.linux.dev,
pbonzini@...hat.com,
seanjc@...gle.com,
dave.hansen@...ux.intel.com,
luto@...nel.org,
peterz@...radead.org,
tglx@...utronix.de,
mingo@...hat.com,
bp@...en8.de,
hpa@...or.com,
corbet@....net,
joro@...tes.org,
will@...nel.org,
robin.murphy@....com,
baolu.lu@...ux.intel.com,
yi.l.liu@...el.com,
Yan Zhao <yan.y.zhao@...el.com>
Subject: [PATCH 0/5] Enforce CPU cache flush for non-coherent device assignment
This is a follow-up series to fix the security risk for non-coherent device
assignment raised by Jason in [1].
When IOMMU does not enforce cache coherency, devices are allowed to perform
non-coherent DMAs (DMAs that lack CPU cache snooping). This scenario poses
a risk of information leakage when the device is assigned into a VM.
Specifically, a malicious guest could potentially retrieve stale host data
through non-coherent DMA reads of physical memory, while data initialized
by host (e.g., zeros) still resides in the cache.
Furthermore, host kernel (e.g. a ksm thread) might encounter inconsistent
data between the CPU cache and physical memory (left by a malicious guest)
after a page is unpinned for DMA but before the page is recycled.
Therefore, a mitigation in VFIO/IOMMUFD is required to flush CPU caches on
pages involved in non-coherent DMAs prior to or following their mapping or
unmapping to or from the IOMMU.
The mitigation is not implemented in DMA API layer, so as to avoid slowing
down the DMA API users. Users of the DMA API are expected to take care of
CPU cache flushing in one of two ways: (a) by using the DMA API which is
aware of the non-coherence and does the flushes internally or (b) be aware
of its flushing needs and handle them on its own if they are overriding the
platform using no-snoop. A general mitigation in DMA API layer will only
come when non-coherent DMAs are common, which, however, is not the case
(now only Intel GPU and some ARM devices).
Also the mitigation is not implemented in IOMMU core for VMs exclusively,
because it would make a large IOTLB flush range being split due to the
absence of information regarding to IOVA-PFN relationship in IOMMU core.
Given non-coherent devices exist both on x86 and ARM, this series
introduces an arch helper to flush CPU caches for non-coherent DMAs which
is available for both VFIO and IOMMUFD, though current only implementation
for x86 is provided.
Series Layout:
Patch 1 first fixes an error in pat_pfn_immune_to_uc_mtrr() which always
returns WB for untracked PAT ranges. This error leads to KVM
treating all PFNs within these untracked PAT ranges as cacheable
memory types, even when a PFN's MTRR type is UC. (An example is for
VGA range from 0xa0000-0xbffff).
Patch 3 will use pat_pfn_immune_to_uc_mtrr() to determine
uncacheable PFNs.
Patch 2 is a side fix in KVM to prevent guest cacheable access to PFNs
mapped as UC in host.
Patch 3 introduces and exports an arch helper arch_clean_nonsnoop_dma() to
flush CPU cachelines. It takes physical address and size as inputs
and provides a implementation for x86.
Given that executing CLFLUSH on certain MMIO ranges on x86 can be
problematic, potentially causing machine check exceptions on some
platforms, while flushing is necessary on some other MMIO ranges
(e.g., some MMIO ranges for PMEM), this patch determines
cacheability by consulting the PAT (if enabled) or MTRR type (if
PAT is disabled). It assesses whether a PFN is considered as
uncacheable by the host. For reserved pages or !pfn_valid() PFN,
CLFLUSH is avoided if the PFN is recognized as uncacheable on the
host.
Patch 4/5 implement a mitigation in vfio/iommufd to flush CPU caches
- before a page is accessible to non-coherent DMAs,
- after the page is inaccessible to non-coherent DMAs, and right
before it's unpinned for DMAs.
Performance data:
The overhead of flushing CPU caches is measured below:
CPU MHz:4494.377, 4 vCPU, 8G guest memory
Pass-through GPU: 1G aperture
Across each VM boot up and tear down,
IOMMUFD | Map | Unmap | Teardown
------------|----------------|----------------|-------------
w/o clflush | 1167M | 40M | 201M
w/ clflush | 2400M (+1233M) | 276M (+236M) | 1160M (+959M)
Map = total cycles of iommufd_ioas_map() during VM boot up
Unmap = total cycles of iommufd_ioas_unmap() during VM boot up
Teardown = total cycles of iommufd_hwpt_paging_destroy() at VM teardown
VFIO | Map | Unmap | Teardown
------------|----------------|----------------|-------------
w/o clflush | 3058M | 379M | 448M
w/ clflush | 5664M (+2606M) | 1653M (+1274M) | 1522M (+1074M)
Map = total cycles of vfio_dma_do_map() during VM boot up
Unmap = total cycles of vfio_dma_do_unmap() during VM boot up
Teardown = total cycles of vfio_iommu_type1_detach_group() at VM teardown
[1] https://lore.kernel.org/lkml/20240109002220.GA439767@nvidia.com
Yan Zhao (5):
x86/pat: Let pat_pfn_immune_to_uc_mtrr() check MTRR for untracked PAT
range
KVM: x86/mmu: Fine-grained check of whether a invalid & RAM PFN is
MMIO
x86/mm: Introduce and export interface arch_clean_nonsnoop_dma()
vfio/type1: Flush CPU caches on DMA pages in non-coherent domains
iommufd: Flush CPU caches on DMA pages in non-coherent domains
arch/x86/include/asm/cacheflush.h | 3 +
arch/x86/kvm/mmu/spte.c | 14 +++-
arch/x86/mm/pat/memtype.c | 12 +++-
arch/x86/mm/pat/set_memory.c | 88 +++++++++++++++++++++++++
drivers/iommu/iommufd/hw_pagetable.c | 19 +++++-
drivers/iommu/iommufd/io_pagetable.h | 5 ++
drivers/iommu/iommufd/iommufd_private.h | 1 +
drivers/iommu/iommufd/pages.c | 44 ++++++++++++-
drivers/vfio/vfio_iommu_type1.c | 51 ++++++++++++++
include/linux/cacheflush.h | 6 ++
10 files changed, 237 insertions(+), 6 deletions(-)
base-commit: e67572cd2204894179d89bd7b984072f19313b03
--
2.17.1
Powered by blists - more mailing lists