linux-kernel - Re: [PATCH v3 03/15] KVM: arm64: x86: Require "struct kvm_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFNmci0s1_P845XZ@google.com>
Date: Wed, 18 Jun 2025 18:22:58 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: James Houghton <jthoughton@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, 
	Jonathan Corbet <corbet@....net>, Marc Zyngier <maz@...nel.org>, Yan Zhao <yan.y.zhao@...el.com>, 
	Nikita Kalyazin <kalyazin@...zon.com>, Anish Moorthy <amoorthy@...gle.com>, 
	Peter Gonda <pgonda@...gle.com>, Peter Xu <peterx@...hat.com>, 
	David Matlack <dmatlack@...gle.com>, wei.w.wang@...el.com, kvm@...r.kernel.org, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev
Subject: Re: [PATCH v3 03/15] KVM: arm64: x86: Require "struct kvm_page_fault"
 for memory fault exits

On Wed, Jun 18, 2025, Oliver Upton wrote:
> On Wed, Jun 18, 2025 at 01:47:36PM -0700, Sean Christopherson wrote:
> > On Wed, Jun 18, 2025, Oliver Upton wrote:
> > > What I would like to see on arm64 is that for every "KVM_EXIT_MEMORY_FAULT"
> > > we provide as much syndrome information as possible. That could imply
> > > some combination of a sanitised view of ESR_EL2 and, where it is
> > > unambiguous, common fault flags that have shared definitions with x86.
> > 
> > Me confused, this is what the above does?  "struct kvm_page_fault" is arch
> > specific, e.g. x86 has a whole pile of stuff in there beyond gfn, exec, write,
> > is_private, and slot.
> 
> Right, but now I need to remember that some of the hardware syndrome
> (exec, write) is handled in the arch-neutral code and the rest belongs
> to the arch.

Yeah, can't argue there.

> > The approach is non-standard, but I think my justification/reasoning for having
> > the structure be arch-defined still holds:
> > 
> >  : Rather than define a common kvm_page_fault and kvm_arch_page_fault child,
> >  : simply assert that the handful of required fields are provided by the
> >  : arch-defined structure.  Unlike vCPU and VMs, the number of common fields
> >  : is expected to be small, and letting arch code fully define the structure
> >  : allows for maximum flexibility with respect to const, layout, etc.
> > 
> > If we could use anonymous struct field, i.e. could embed a kvm_arch_page_fault
> > without having to bounce through an "arch" field, I would vote for the approach.
> > Sadly, AFAIK, we can't yet use those in the kernel.
> 
> The general impression is that this is an unnecessary amount of complexity
> for doing something trivial (computing flags).

It looks pretty though!

> > Nothing prevents arm64 (or any arch) from wrapping kvm_prepare_memory_fault_exit()
> > and/or taking action after it's invoked.  That's not an accident; the "prepare
> > exit" helpers (x86 has a few more) were specifically designed to not be used as
> > the "return" to userspace.  E.g. this one returns "void" instead of -EFAULT
> > specifically so that the callers isn't "required" to ignore the return if the
> > caller wants to populate (or change, but hopefully that's never the case) fields
> > after calling kvm_prepare_memory_fault_exit), and so that arch can return an
> > entirely different error code, e.g. -EHWPOISON when appropriate.
> 
> IMO, this does not achieve the desired layering / ownership of memory
> fault triage. This would be better organized as the arch code computing
> all of the flags relating to the hardware syndrome (even boring ones
> like RWX) 

Just to make sure I'm not misinterpreting things, by "computing all of the flags",
you mean computing KVM_MEMORY_EXIT_FLAG_xxx flags that are derived from hardware
state, correct?

> and arch-neutral code potentially lending a hand with the software bits.
>
> With this I either need to genericize the horrors of the Arm
> architecture in the common thing or keep track of what parts of the
> hardware flags are owned by arch v. non-arch. SW v. HW fault context is
> a cleaner split, IMO.

The problem I'm struggling with is where to draw the line.  If we leave hardware
state to arch code, then we're not left with much.  Hmm, but it really is just
the gfn/gpa that's needed in common code to avoid true ugliness.  The size is
technically arch specific, but the reported size is effectively a placeholder,
i.e. it's always PAGE_SIZE, and probably always will be PAGE_SIZE, but we wanted
to give ourselves an out if necessary.

Would you be ok having common code fill gpa and size?  If so, then we can do this:

--
void kvm_arch_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
					struct kvm_page_fault *fault);

static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
						 struct kvm_page_fault *fault)
{
	KVM_ASSERT_TYPE_IS(gfn_t, fault->gfn);

	vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
	vcpu->run->memory_fault.gpa = fault->gfn << PAGE_SHIFT;
	vcpu->run->memory_fault.size = PAGE_SIZE;

	vcpu->run->memory_fault.flags = 0;
	kvm_arch_prepare_memory_fault_exit(vcpu, fault);
}
--

where arm64's arch hook is empty, and x86's is:

--
static inline void kvm_arch_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
						      struct kvm_page_fault *fault)
{
	if (fault->is_private)
		vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
}
--

It's not perfect, but it should be much easier to describe the contract, and
common code can still pass around a kvm_page_fault structure instead of a horde
of booleans.