linux-kernel - Re: [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMlzLsj5slPQhWEr@google.com>
Date: Tue, 16 Sep 2025 14:24:46 +0000
From: Mostafa Saleh <smostafa@...gle.com>
To: Will Deacon <will@...nel.org>
Cc: linux-kernel@...r.kernel.org, kvmarm@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
	maz@...nel.org, oliver.upton@...ux.dev, joey.gouly@....com,
	suzuki.poulose@....com, yuzenghui@...wei.com,
	catalin.marinas@....com, robin.murphy@....com,
	jean-philippe@...aro.org, qperret@...gle.com, tabba@...gle.com,
	jgg@...pe.ca, mark.rutland@....com, praan@...gle.com
Subject: Re: [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page
 table

On Tue, Sep 09, 2025 at 03:42:07PM +0100, Will Deacon wrote:
> On Tue, Aug 19, 2025 at 09:51:38PM +0000, Mostafa Saleh wrote:
> > Create a shadow page table for the IOMMU that shadows the
> > host CPU stage-2 into the IOMMUs to establish DMA isolation.
> > 
> > An initial snapshot is created after the driver init, then
> > on every permission change a callback would be called for
> > the IOMMU driver to update the page table.
> > 
> > For some cases, an SMMUv3 may be able to share the same page
> > table used with the host CPU stage-2 directly.
> > However, this is too strict and requires changes to the core hypervisor
> > page table code, plus it would require the hypervisor to handle IOMMU
> > page faults. This can be added later as an optimization for SMMUV3.
> > 
> > Signed-off-by: Mostafa Saleh <smostafa@...gle.com>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/iommu.h |  4 ++
> >  arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 83 ++++++++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  5 ++
> >  3 files changed, 90 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> > index 1ac70cc28a9e..219363045b1c 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> > @@ -3,11 +3,15 @@
> >  #define __ARM64_KVM_NVHE_IOMMU_H__
> >  
> >  #include <asm/kvm_host.h>
> > +#include <asm/kvm_pgtable.h>
> >  
> >  struct kvm_iommu_ops {
> >  	int (*init)(void);
> > +	void (*host_stage2_idmap)(phys_addr_t start, phys_addr_t end, int prot);
> >  };
> >  
> >  int kvm_iommu_init(void);
> >  
> > +void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
> > +				 enum kvm_pgtable_prot prot);
> >  #endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
> > diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > index a01c036c55be..f7d1c8feb358 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > @@ -4,15 +4,94 @@
> >   *
> >   * Copyright (C) 2022 Linaro Ltd.
> >   */
> > +#include <linux/iommu.h>
> > +
> >  #include <nvhe/iommu.h>
> > +#include <nvhe/mem_protect.h>
> > +#include <nvhe/spinlock.h>
> >  
> >  /* Only one set of ops supported */
> >  struct kvm_iommu_ops *kvm_iommu_ops;
> >  
> > +/* Protected by host_mmu.lock */
> > +static bool kvm_idmap_initialized;
> > +
> > +static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
> > +{
> > +	int iommu_prot = 0;
> > +
> > +	if (prot & KVM_PGTABLE_PROT_R)
> > +		iommu_prot |= IOMMU_READ;
> > +	if (prot & KVM_PGTABLE_PROT_W)
> > +		iommu_prot |= IOMMU_WRITE;
> > +	if (prot == PKVM_HOST_MMIO_PROT)
> > +		iommu_prot |= IOMMU_MMIO;
> 
> This looks a little odd to me.
> 
> On the CPU side, the only different between PKVM_HOST_MEM_PROT and
> PKVM_HOST_MMIO_PROT is that the former has execute permission. Both are
> mapped as cacheable at stage-2 because it's the job of the host to set
> the more restrictive memory type at stage-1.
> 
> Carrying that over to the SMMU would suggest that we don't care about
> IOMMU_MMIO at stage-2 at all, so why do we need to set it here?

Unlike the CPU, the host can set the SMMU to bypass, in that case the
hypervisor will attach its stage-2 with no stage-1 configured. So,
stage-2 must have the correct attrs for MMIO.

> 
> > +	/* We don't understand that, might be dangerous. */
> > +	WARN_ON(prot & ~PKVM_HOST_MEM_PROT);
> > +	return iommu_prot;
> > +}
> > +
> > +static int __snapshot_host_stage2(const struct kvm_pgtable_visit_ctx *ctx,
> > +				  enum kvm_pgtable_walk_flags visit)
> > +{
> > +	u64 start = ctx->addr;
> > +	kvm_pte_t pte = *ctx->ptep;
> > +	u32 level = ctx->level;
> > +	u64 end = start + kvm_granule_size(level);
> > +	int prot =  IOMMU_READ | IOMMU_WRITE;
> > +
> > +	/* Keep unmapped. */
> > +	if (pte && !kvm_pte_valid(pte))
> > +		return 0;
> > +
> > +	if (kvm_pte_valid(pte))
> > +		prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte));
> > +	else if (!addr_is_memory(start))
> > +		prot |= IOMMU_MMIO;
> 
> Why do we need to map MMIO regions pro-actively here? I'd have thought
> we could just do:
> 
> 	if (!kvm_pte_valid(pte))
> 		return 0;
> 
> 	prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte);
> 	kvm_iommu_ops->host_stage2_idmap(start, end, prot);
> 	return 0;
> 
> but I think that IOMMU_MMIO is throwing me again...

We have to map everything pro-actively as we don’t handle page faults
in the SMMUv3 driver.
This would be a future work where the CPU stage-2 page table is shared with
the SMMUv3.

Thanks,
Mostafa

> 
> Will