[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFNIPXoEb5iCjt_L@linux.dev>
Date: Wed, 18 Jun 2025 16:14:05 -0700
From: Oliver Upton <oliver.upton@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>
Cc: James Houghton <jthoughton@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Jonathan Corbet <corbet@....net>, Marc Zyngier <maz@...nel.org>,
Yan Zhao <yan.y.zhao@...el.com>,
Nikita Kalyazin <kalyazin@...zon.com>,
Anish Moorthy <amoorthy@...gle.com>,
Peter Gonda <pgonda@...gle.com>, Peter Xu <peterx@...hat.com>,
David Matlack <dmatlack@...gle.com>, wei.w.wang@...el.com,
kvm@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
kvmarm@...ts.linux.dev
Subject: Re: [PATCH v3 03/15] KVM: arm64: x86: Require "struct
kvm_page_fault" for memory fault exits
On Wed, Jun 18, 2025 at 01:47:36PM -0700, Sean Christopherson wrote:
> On Wed, Jun 18, 2025, Oliver Upton wrote:
> > On Wed, Jun 18, 2025 at 04:24:12AM +0000, James Houghton wrote:
> > > +#ifdef CONFIG_KVM_GENERIC_PAGE_FAULT
> > > +
> > > +#define KVM_ASSERT_TYPE_IS(type_t, x) \
> > > +do { \
> > > + type_t __maybe_unused tmp; \
> > > + \
> > > + BUILD_BUG_ON(!__types_ok(tmp, x) || !__typecheck(tmp, x)); \
> > > +} while (0)
> > > +
> > > static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> > > - gpa_t gpa, gpa_t size,
> > > - bool is_write, bool is_exec,
> > > - bool is_private)
> > > + struct kvm_page_fault *fault)
> > > {
> > > + KVM_ASSERT_TYPE_IS(gfn_t, fault->gfn);
> > > + KVM_ASSERT_TYPE_IS(bool, fault->exec);
> > > + KVM_ASSERT_TYPE_IS(bool, fault->write);
> > > + KVM_ASSERT_TYPE_IS(bool, fault->is_private);
> > > + KVM_ASSERT_TYPE_IS(struct kvm_memory_slot *, fault->slot);
> > > +
> > > vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
> > > - vcpu->run->memory_fault.gpa = gpa;
> > > - vcpu->run->memory_fault.size = size;
> > > + vcpu->run->memory_fault.gpa = fault->gfn << PAGE_SHIFT;
> > > + vcpu->run->memory_fault.size = PAGE_SIZE;
> > >
> > > /* RWX flags are not (yet) defined or communicated to userspace. */
> > > vcpu->run->memory_fault.flags = 0;
> > > - if (is_private)
> > > + if (fault->is_private)
> > > vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
> > > }
> > > +#endif
> >
> > This *is not* the right direction of travel for arm64. Stage-2 aborts /
> > EPT violations / etc. are extremely architecture-specific events.
>
> Yes and no. 100% agreed there are arch/vendor specific aspects of stage-2 faults,
> but there are most definitely commonalites as well.
And I agree those commonalities should be expressed with the same flags
where possible.
> > What I would like to see on arm64 is that for every "KVM_EXIT_MEMORY_FAULT"
> > we provide as much syndrome information as possible. That could imply
> > some combination of a sanitised view of ESR_EL2 and, where it is
> > unambiguous, common fault flags that have shared definitions with x86.
>
> Me confused, this is what the above does? "struct kvm_page_fault" is arch
> specific, e.g. x86 has a whole pile of stuff in there beyond gfn, exec, write,
> is_private, and slot.
Right, but now I need to remember that some of the hardware syndrome
(exec, write) is handled in the arch-neutral code and the rest belongs
to the arch.
> The approach is non-standard, but I think my justification/reasoning for having
> the structure be arch-defined still holds:
>
> : Rather than define a common kvm_page_fault and kvm_arch_page_fault child,
> : simply assert that the handful of required fields are provided by the
> : arch-defined structure. Unlike vCPU and VMs, the number of common fields
> : is expected to be small, and letting arch code fully define the structure
> : allows for maximum flexibility with respect to const, layout, etc.
>
> If we could use anonymous struct field, i.e. could embed a kvm_arch_page_fault
> without having to bounce through an "arch" field, I would vote for the approach.
> Sadly, AFAIK, we can't yet use those in the kernel.
The general impression is that this is an unnecessary amount of
complexity for doing something trivial (computing flags).
> > This could incur some minor code duplication, but even then we can share
> > helpers for the software bits (like userfault).
>
> Again, that is what is proposed here.
>
> > FEAT_MTE_PERM is a very good example for this. There exists a "Tag"
> > permission at stage-2 which is unrelated to any of the 'normal'
> > read/write permissions. There's also the MostlyReadOnly permission from
> > FEAT_THE which grants write permission to a specific set of instructions.
> >
> > I don't want to paper over these nuances and will happily maintain an
> > arm64-specific flavor of "kvm_prepare_memory_fault_exit()"
>
> Nothing prevents arm64 (or any arch) from wrapping kvm_prepare_memory_fault_exit()
> and/or taking action after it's invoked. That's not an accident; the "prepare
> exit" helpers (x86 has a few more) were specifically designed to not be used as
> the "return" to userspace. E.g. this one returns "void" instead of -EFAULT
> specifically so that the callers isn't "required" to ignore the return if the
> caller wants to populate (or change, but hopefully that's never the case) fields
> after calling kvm_prepare_memory_fault_exit), and so that arch can return an
> entirely different error code, e.g. -EHWPOISON when appropriate.
IMO, this does not achieve the desired layering / ownership of memory
fault triage. This would be better organized as the arch code computing
all of the flags relating to the hardware syndrome (even boring ones
like RWX) and arch-neutral code potentially lending a hand with the
software bits.
With this I either need to genericize the horrors of the Arm
architecture in the common thing or keep track of what parts of the
hardware flags are owned by arch v. non-arch. SW v. HW fault context is
a cleaner split, IMO.
> And it's not just kvm_prepare_memory_fault_exit() that I want to use kvm_page_fault;
> kvm_faultin_pfn() is another case where having a common "struct kvm_page_fault"
> would clean up some ugly/annoying boilerplate.
That might be a better starting point for unifying these things, esp.
since kvm_faultin_pfn() doesn't have UAPI implications hiding behind it
and is already using common parameters.
Thanks,
Oliver
Powered by blists - more mailing lists