linux-kernel - Re: [PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aNakIop0jdz9gH-t@willie-the-truck>
Date: Fri, 26 Sep 2025 15:33:06 +0100
From: Will Deacon <will@...nel.org>
To: Mostafa Saleh <smostafa@...gle.com>
Cc: linux-kernel@...r.kernel.org, kvmarm@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
	maz@...nel.org, oliver.upton@...ux.dev, joey.gouly@....com,
	suzuki.poulose@....com, yuzenghui@...wei.com,
	catalin.marinas@....com, robin.murphy@....com,
	jean-philippe@...aro.org, qperret@...gle.com, tabba@...gle.com,
	jgg@...pe.ca, mark.rutland@....com, praan@...gle.com
Subject: Re: [PATCH v4 02/28] KVM: arm64: Donate MMIO to the hypervisor

On Tue, Sep 16, 2025 at 01:27:39PM +0000, Mostafa Saleh wrote:
> On Tue, Sep 09, 2025 at 03:12:45PM +0100, Will Deacon wrote:
> > On Tue, Aug 19, 2025 at 09:51:30PM +0000, Mostafa Saleh wrote:
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index 861e448183fd..c9a15ef6b18d 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -799,6 +799,70 @@ int ___pkvm_host_donate_hyp(u64 pfn, u64 nr_pages, enum kvm_pgtable_prot prot)
> > >  	return ret;
> > >  }
> > >  
> > > +int __pkvm_host_donate_hyp_mmio(u64 pfn)
> > > +{
> > > +	u64 phys = hyp_pfn_to_phys(pfn);
> > > +	void *virt = __hyp_va(phys);
> > > +	int ret;
> > > +	kvm_pte_t pte;
> > > +
> > > +	host_lock_component();
> > > +	hyp_lock_component();
> > > +
> > > +	ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, NULL);
> > > +	if (ret)
> > > +		goto unlock;
> > > +
> > > +	if (pte && !kvm_pte_valid(pte)) {
> > > +		ret = -EPERM;
> > > +		goto unlock;
> > > +	}
> > 
> > Shouldn't we first check that the pfn is indeed MMIO? Otherwise, testing
> > the pte for the ownership information isn't right.
> 
> I will add it, although the input should be trusted as it comes from the
> hypervisor SMMUv3 driver.

(more on this below)

> > > +int __pkvm_hyp_donate_host_mmio(u64 pfn)
> > > +{
> > > +	u64 phys = hyp_pfn_to_phys(pfn);
> > > +	u64 virt = (u64)__hyp_va(phys);
> > > +	size_t size = PAGE_SIZE;
> > > +
> > > +	host_lock_component();
> > > +	hyp_lock_component();
> > 
> > Shouldn't we check that:
> > 
> >   1. pfn is mmio
> >   2. pfn is owned by hyp
> >   3. The host doesn't have something mapped at pfn already
> > 
> > ?
> > 
> 
> I thought about this initially, but as
> - This code is only called from the hypervisor with trusted
>   inputs (only at boot)
> - Only called on error path
> 
> So WARN_ON in case of failure to unmap MMIO pages seemed is good enough,
> to avoid extra code.
> 
> But I can add the checks if you think they are necessary, we will need
> to add new helpers for MMIO state though.

I'd personally prefer to put the checks here so that callers don't have
to worry (or forget!) about them. That also means that the donation
function can be readily reused in the same way as the existing functions
which operate on memory pages.

How much work is it to add the MMIO helpers?

> > > +	WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, virt, size) != size);
> > > +	WARN_ON(host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt, phys,
> > > +				PAGE_SIZE, &host_s2_pool, PKVM_ID_HOST));
> > > +	hyp_unlock_component();
> > > +	host_unlock_component();
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
> > >  {
> > >  	return ___pkvm_host_donate_hyp(pfn, nr_pages, PAGE_HYP);
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index c351b4abd5db..ba06b0c21d5a 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -1095,13 +1095,8 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
> > >  	kvm_pte_t *childp = NULL;
> > >  	bool need_flush = false;
> > >  
> > > -	if (!kvm_pte_valid(ctx->old)) {
> > > -		if (stage2_pte_is_counted(ctx->old)) {
> > > -			kvm_clear_pte(ctx->ptep);
> > > -			mm_ops->put_page(ctx->ptep);
> > > -		}
> > > -		return 0;
> > > -	}
> > > +	if (!kvm_pte_valid(ctx->old))
> > > +		return stage2_pte_is_counted(ctx->old) ? -EPERM : 0;
> > 
> > Can this code be reached for the guest? For example, if
> > pkvm_pgtable_stage2_destroy() runs into an MMIO-guarded pte on teardown?
> 
> AFAICT, VMs page table is destroyed from reclaim_pgtable_pages() =>
> kvm_pgtable_stage2_destroy() => kvm_pgtable_stage2_destroy_range() ... =>
> stage2_free_walker()
> 
> Which doesn't interact with “stage2_unmap_walker”, so that should be
> fine.

Fair enough. I feel like this might bite us later on but, with what you
have, we'll see the -EPERM and then we can figure out what to do then.

Will