lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aNHbUqTi4nOK_w5I@linux.dev>
Date: Mon, 22 Sep 2025 16:27:14 -0700
From: Oliver Upton <oliver.upton@...ux.dev>
To: Priscilla Lam <prl@...zon.com>
Cc: maz@...nel.org, joey.gouly@....com, suzuki.poulose@....com,
	yuzenghui@...wei.com, dwmw@...zon.co.uk, gurugubs@...zon.com,
	christoffer.dall@....com, graf@...zon.com,
	linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: arm64: Implement KVM_TRANSLATE ioctl for arm64

Hi Priscilla,

On Mon, Sep 22, 2025 at 01:24:52PM -0700, Priscilla Lam wrote:
> There is a KVM_TRANSLATE ioctl for x86 to translate a GVA
> (guest virtual address) to a GPA (guest physical address) in EL1
> which is not yet implemented for arm64.
> 
> Implement KVM_TRANSLATE on arm64 for both configurations that
> support and do not support VHE. The VHE path uses the AT
> instruction directly while the non-VHE implementation wraps the
> AT call in a hypercall to allow for its execution in EL2. Add
> selftest that tests the ioctl in both configurations.
> 
> Signed-off-by: Priscilla Lam <prl@...zon.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |   2 +
>  arch/arm64/kvm/guest.c                        |  89 ++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/Makefile              |   3 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  10 ++
>  arch/arm64/kvm/hyp/nvhe/translate.c           |  84 ++++++++++++++
>  tools/testing/selftests/kvm/Makefile.kvm      |   1 +
>  tools/testing/selftests/kvm/arm64/translate.c | 107 ++++++++++++++++++

Please do selftests changes in a separate patch.

> +static int kvm_translate_vhe(struct kvm_vcpu *vcpu, struct kvm_translation *tr)

So arch/arm64/kvm/at.c exists for this exact purpose: walking guest page
tables. While it was previously constrained to handling NV-enabled VMs,
Marc's SEA TTW series opens up the stage-1 walker for general use.

I am not keen on adding yet another page-table walker implementation for
the sole purpose of servicing this ioctl.

https://lore.kernel.org/kvmarm/175845226586.1792635.11249464012956393475.b4-ty@kernel.org

> +{
> +	unsigned long flags;
> +	uint64_t hcr_old, hcr_new, par;
> +	const uint64_t gva = tr->linear_address;

"linear_address" is a delightful x86-ism. I'd prefer naming that was
either architecture-generic -or- an arm64-specific struct that used our
architectural terms.

> +	preempt_disable();
> +	local_irq_save(flags);
> +
> +	/* Ensure we're in the expected VHE regime and enable S2 so PAR returns IPA. */
> +	hcr_old = read_sysreg(hcr_el2);
> +	hcr_new = hcr_old | HCR_E2H | HCR_VM;
> +	hcr_new &= ~HCR_TGE;
> +	write_sysreg(hcr_new, hcr_el2);
> +	isb();

Thanks to borken hardware, this needs to go through the write_sysreg_hcr()
accessor.

> +
> +	/* Load guest EL1 S1 context into *_EL12 (do not write into _EL1). */
> +	write_sysreg_s(vcpu_read_sys_reg(vcpu, TTBR0_EL1), SYS_TTBR0_EL12);
> +	write_sysreg_s(vcpu_read_sys_reg(vcpu, TTBR1_EL1), SYS_TTBR1_EL12);
> +	write_sysreg_s(vcpu_read_sys_reg(vcpu, TCR_EL1), SYS_TCR_EL12);
> +	write_sysreg_s(vcpu_read_sys_reg(vcpu, MAIR_EL1), SYS_MAIR_EL12);
> +	write_sysreg_s(vcpu_read_sys_reg(vcpu, SCTLR_EL1), SYS_SCTLR_EL12);

KVM supports both FEAT_S1PIE and FEAT_S1POE, so this is not a complete
MMU context.

> +	/* Check address read */
> +	asm volatile("at s1e1r, %0" :: "r"(gva));
> +	isb();

The AT instruction can generate an exception, which is why __kvm_at()
exists.

And this is where reusing the existing translation infrastructure is
really important. The AT instruction *will* fail if the stage-1
translation tables are unmapped at stage-2. The only option at that
point is falling back to a software table walker that potentially faults
in the missing translation tables.

> +
> +	par = read_sysreg(par_el1);
> +	if (!(par & 1)) {
> +		tr->valid = true;
> +		tr->physical_address = par_to_ipa(par, gva);

What about permissions besides RW?

> +int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, struct kvm_translation *tr)
> +{
> +	if (has_vhe())
> +		return kvm_translate_vhe(vcpu, tr);
> +	else
> +		return kvm_translate_nvhe(vcpu, tr);
>  }

Yet another interesting consideration around this entire infrastructure
is the guest's view of the translation that the VMM will now use. KVM uses a
pseudo-TLB for the guest's VNCR page and maintains it just like a literal
TLB.

How would the guest invalidate the translation fetched by the VMM when
unmapping/remapping the VA? Doesn't the stage-1 PTW need to set the
Access flag as this amounts to a TLB fill?

Understanding what it is you're trying to accomplish would be quite
helpful. I'm concerned this trivializes some of the gory details of
participating in the guest's virtual memory.

Thanks,
Oliver

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ