linux-kernel - Re: [PATCH v2 3/3] KVM: selftests: Test prefault memory during concurrent memslot removal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMfMk/x5XJ1bfvzv@yzhao56-desk.sh.intel.com>
Date: Mon, 15 Sep 2025 16:21:39 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <pbonzini@...hat.com>, <reinette.chatre@...el.com>,
	<rick.p.edgecombe@...el.com>, <linux-kernel@...r.kernel.org>,
	<kvm@...r.kernel.org>
Subject: Re: [PATCH v2 3/3] KVM: selftests: Test prefault memory during
 concurrent memslot removal

On Mon, Sep 08, 2025 at 04:47:23PM -0700, Sean Christopherson wrote:
> On Fri, Aug 22, 2025, Yan Zhao wrote:
> >  .../selftests/kvm/pre_fault_memory_test.c     | 94 +++++++++++++++----
> >  1 file changed, 78 insertions(+), 16 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
> > index 0350a8896a2f..56e65feb4c8c 100644
> > --- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
> > +++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
> > @@ -10,12 +10,16 @@
> >  #include <test_util.h>
> >  #include <kvm_util.h>
> >  #include <processor.h>
> > +#include <pthread.h>
> >  
> >  /* Arbitrarily chosen values */
> >  #define TEST_SIZE		(SZ_2M + PAGE_SIZE)
> >  #define TEST_NPAGES		(TEST_SIZE / PAGE_SIZE)
> >  #define TEST_SLOT		10
> >  
> > +static bool prefault_ready;
> > +static bool delete_thread_ready;
> > +
> >  static void guest_code(uint64_t base_gpa)
> >  {
> >  	volatile uint64_t val __used;
> > @@ -30,17 +34,47 @@ static void guest_code(uint64_t base_gpa)
> >  	GUEST_DONE();
> >  }
> >  
> > -static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
> > -			     u64 left)
> > +static void *remove_slot_worker(void *data)
> > +{
> > +	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> > +
> > +	WRITE_ONCE(delete_thread_ready, true);
> > +
> > +	while (!READ_ONCE(prefault_ready))
> > +		cpu_relax();
> > +
> > +	vm_mem_region_delete(vcpu->vm, TEST_SLOT);
> > +
> > +	WRITE_ONCE(delete_thread_ready, false);
> 
> Rather than use global variables, which necessitates these "dances" to get things
> back to the initial state, use an on-stack structure to communicate (and obviously
> make sure the structure is initialized :-D).
Sorry for the late reply.

Indeed, this makes the code more elegant!

> > +	return NULL;
> > +}
> > +
> > +static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 base_gpa, u64 offset,
> > +			     u64 size, u64 left, bool private, bool remove_slot)
> >  {
> >  	struct kvm_pre_fault_memory range = {
> > -		.gpa = gpa,
> > +		.gpa = base_gpa + offset,
> >  		.size = size,
> >  		.flags = 0,
> >  	};
> > -	u64 prev;
> > +	pthread_t remove_thread;
> > +	bool remove_hit = false;
> >  	int ret, save_errno;
> > +	u64 prev;
> >  
> > +	if (remove_slot) {
> 
> I don't see any reason to make the slot removal conditional.  There are three
> things we're interested in testing (so far):
> 
>  1. Success
>  2. ENOENT due to no memslot
>  3. EAGAIN due to INVALID memslot
> 
> #1 and #2 are mutually exclusive, or rather easier to test via separate testcases
> (because writing to non-existent memory is trivial).  But for #3, I don't see a
> reason to make it mutually exclusive with #1 _or_ #2.
> 
> As written, it's always mutually exclusive with #2 because otherwise it would be
> difficult (impossible?) to determine if KVM exited on the "right" address.  But
> the only reason that's true is because the test recreates the slot *after*
> prefaulting, and _that_ makes #3 _conditionally_ mutually exclusive with #1,
> i.e. the test doesn't validate success if the INVALID memslot race is hit.
> 
> Rather than make everything mutually exclusive, just restore the memslot and
> retry prefaulting.  That also gives us easy bonus coverage that doing
> KVM_PRE_FAULT_MEMORY on memory that has already been faulted in is idempotent,
> i.e. that KVM_PRE_FAULT_MEMORY succeeds if it already succeeded (and nothing
> nuked the mappings in the interim).
That's a good idea.

> If the memslot is restored and the loop retries, then #3 becomes a complimentary
> and orthogonal testcase to #1 and #2.
> 
> This?  (with an opportunistic s/left/expected_left that confused me; I thought
> "left" meant how many bytes were left to prefault, but it actually means how many
> bytes are expected to be left when failure occurs).
Looks good to me, except for a minor bug.

> +		if (!slot_recreated) {
> +			WRITE_ONCE(data.recreate_slot, true);
> +			pthread_join(slot_worker, NULL);
> +			slot_recreated = true;
> +			continue;
If delete_slot_worker() invokes vm_mem_region_delete() slowly enough due to
scheduling delays, the return value from __vcpu_ioctl() could be 0 with
range.size being 0 at this point.

What about checking range.size before continuing?

@@ -120,7 +126,8 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 base_gpa, u64 offset,
                        WRITE_ONCE(data.recreate_slot, true);
                        pthread_join(slot_worker, NULL);
                        slot_recreated = true;
-                       continue;
+                       if (range.size)
+                               continue;
                }


Otherwise, the next __vcpu_ioctl() would return -1 with errno == EINVAL, which
will break the assertion below.
	
> +	/*
> +	 * Assert success if prefaulting the entire range should succeed, i.e.
> +	 * complete with no bytes remaining.  Otherwise prefaulting should have
> +	 * failed due to ENOENT (due to RET_PF_EMULATE for emulated MMIO when
> +	 * no memslot exists).
> +	 */
> +	if (!expected_left)
> +		TEST_ASSERT_VM_VCPU_IOCTL(!ret, KVM_PRE_FAULT_MEMORY, ret, vcpu->vm);