lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923082955.66602-1-prl@amazon.com>
Date: Tue, 23 Sep 2025 01:29:55 -0700
From: Priscilla Lam <prl@...zon.com>
To: <maz@...nel.org>, <oliver.upton@...ux.dev>
CC: <christoffer.dall@....com>, <dwmw@...zon.co.uk>, <graf@...zon.com>,
	<gurugubs@...zon.com>, <jgrall@...zon.co.uk>, <joey.gouly@....com>,
	<kvmarm@...ts.linux.dev>, <linux-arm-kernel@...ts.infradead.org>,
	<linux-kernel@...r.kernel.org>, <prl@...zon.com>, <suzuki.poulose@....com>,
	<yuzenghui@...wei.com>
Subject: Re: Re: [PATCH] KVM: arm64: Implement KVM_TRANSLATE ioctl for arm64

Hi Oliver and Marc,

Thanks for the detailed feedback.

> But at the end of the day, what do you need KVM_TRANSLATE for? This
> interface is an absolute turd that is unable to represent the bare
> minimum of the architecture (writable by whom? physical address in
> which translation regime? what about S2 translations?), and is better
> left in the "utter brain fart" category.

Regarding motivation, this patch is intended to give a userspace vmm
the ability to handle non-ISV guest faults. The Arm Arm (DDI 0487L.b,
section B3.13.6) notes that for load/store pair faults, the syndrome
may not provide the specifics of the access that faulted. In those
cases, the vmm must manually decode the instruction to emulate it. The
introduction of KVM_CAP_ARM_NISV_TO_USER
(https://lore.kernel.org/kvm/20191120164236.29359-2-maz@kernel.org/)
seems to have anticipated that flow by allowing exits to userspace on
trapped NISV instructions. What is still missing is a reliable way for
userspace to query VA->IPA translations in order to complete emulation.

> Please do selftests changes in a separate patch.

Ack, will split the kernel changes and selftests into 1/2 and 2/2.

> So arch/arm64/kvm/at.c exists for this exact purpose: walking guest page
> tables. While it was previously constrained to handling NV-enabled VMs,
> Marc's SEA TTW series opens up the stage-1 walker for general use.

Thanks for the reference, I wasn't aware of this. I'll drop the bespoke 
VHE/NVHE paths and use the shared S1 walker in v2.

> "linear_address" is a delightful x86-ism. I'd prefer naming that was
> either architecture-generic -or- an arm64-specific struct that used our
> architectural terms.

I'll switch internal naming to VA/IPA. For uAPI, I'll retain the field
for compatibility and translate internally.

> Thanks to borken hardware, this needs to go through the write_sysreg_hcr()
> accessor.

Ack, will use write_sysreg_hcr().

> KVM supports both FEAT_S1PIE and FEAT_S1POE, so this is not a complete
> MMU context.

Understood. v2 will rely on the shared walker to avoid missing S1PIE/S1POE.

> The AT instruction can generate an exception, which is why __kvm_at()
> exists.
>
> And this is where reusing the existing translation infrastructure is
> really important. The AT instruction *will* fail if the stage-1
> translation tables are unmapped at stage-2. The only option at that
> point is falling back to a software table walker that potentially faults
> in the missing translation tables.

v2 will use __kvm_at() and the fallback software walk.

> What about permissions besides RW?

I'll add support for the additional bit (execute and EL0) in v2.

> Yet another interesting consideration around this entire infrastructure
> is the guest's view of the translation that the VMM will now use. KVM
> uses a pseudo-TLB for the guest's VNCR page and maintains it just like a
> literal TLB.
>
> How would the guest invalidate the translation fetched by the VMM when
> unmapping/remapping the VA? Doesn't the stage-1 PTW need to set the
> Access flag as this amounts to a TLB fill?
> Understanding what it is you're trying to accomplish would be quite
> helpful. I'm concerned this trivializes some of the gory details of
> participating in the guest's virtual memory.

My intent is for this ioctl to be side-effect free with no AF updates 
and guest-visible TLB fills. I’ll send v2 as two patches with the above
changes.

Thanks,
Priscilla

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ