linux-kernel - Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <diqz7bxfsz6l.fsf@google.com>
Date: Wed, 01 Oct 2025 10:18:10 +0000
From: Ackerley Tng <ackerleytng@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Janosch Frank <frankja@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, David Hildenbrand <david@...hat.com>, 
	Fuad Tabba <tabba@...gle.com>
Subject: Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap()
 to assert success

Sean Christopherson <seanjc@...gle.com> writes:

> On Tue, Sep 30, 2025, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@...gle.com> writes:
>> > To be perfectly honest, I forgot test_util.h existed :-)
>>
>> Merging/dropping one of kvm_util.h vs test_util.h is a good idea. The
>> distinction is not clear and it's already kind of messy between the two.
>
> That's a topic for another day.
>
>> It's a common pattern in KVM selftests to have a syscall/ioctl wrapper
>> foo() that asserts defaults and a __foo() that doesn't assert anything
>> and allows tests to assert something else, but I have a contrary
>> opinion.
>>
>> I think it's better that tests be explicit about what they're testing
>> for, so perhaps it's better to use macros like TEST_ASSERT_EQ() to
>> explicitly call a function and check the results.
>
> No, foo() and __foo() is a well-established pattern in the kernel, and in KVM
> selftests it is a very well-established pattern for syscalls and ioctls.  And
> I feel very, very strong about handling errors in the core infrastructure.
>
> Relying on developers to remember to add an assert is 100% guaranteed to result
> in missed asserts.  That makes everyone's life painful, because inevitably an
> ioctl will fail on someone else's system, and then they're stuck debugging a
> super random failure with no insight into what the developer _meant_ to do.
>
> And requiring developers to write (i.e. copy+paste) boring, uninteresting code
> to handle failures adds a lot of friction to development, is a terrible use of
> developers' time, and results in _awful_ error messages.  Bad or missing error
> messages in tests have easily wasted tens of hours of just _my_ time; I suspect
> the total cost throughout the KVM community can be measured in tens of days.
>
> E.g. pop quiz, what state did I clobber that generated this error message with
> a TEST_ASSERT_EQ(ret, 0)?  Answer at the bottom.
>
>   ==== Test Assertion Failure ====
>   lib/x86/processor.c:1128: ret == 0
>   pid=2456 tid=2456 errno=22 - Invalid argument
>      1	0x0000000000415465: vcpu_load_state at processor.c:1128
>      2	0x0000000000402805: save_restore_vm at hyperv_evmcs.c:221
>      3	0x000000000040204d: main at hyperv_evmcs.c:286
>      4	0x000000000041df43: __libc_start_call_main at libc-start.o:?
>      5	0x00000000004200ec: __libc_start_main_impl at ??:?
>      6	0x0000000000402220: _start at ??:?
>   0xffffffffffffffff != 0 (ret != 0)
>
> You might say "oh, I can go look at the source".  But what if you don't have the
> source because you got a test failure from CI?  Or because the assert came from
> a bug report due to a failure in someone else's CI pipeline?
>
> That is not a contrived example.  Before the ioctl assertion framework was added,
> KVM selftests was littered with such garbage.  Note, I'm not blaming developers
> in any way.  After having to add tens of asserts on KVM ioctls just to write a
> simple test, it's entirely natural to become fatigued and start throwing in
> TEST_ASSERT_EQ(ret, 0) or TEST_ASSERT(!ret, "ioctl failed").
>
> There's also the mechanics of requiring the caller to assert.  KVM ioctls that
> return a single value, e.g. register accessors, then need to use an out-param to
> communicate the value or error code, e.g. this
>
> 	val = vcpu_get_reg(vcpu, reg_id);
> 	TEST_ASSERT_EQ(val, 0);
>
> would become this:
>
> 	ret = vcpu_get_reg(vcpu, reg_id, &val);
> 	TEST_ASSERT_EQ(ret, 0);
> 	TEST_ASSERT_EQ(val, 0);
>
> But of course, the developer would bundle that into:
>
> 	TEST_ASSERT(!ret && !val, "get_reg failed");
>
> And then the user is really sad when the "!val" condition fails, because they
> can't even tell.  Again, this't a contrived example, it literally happend to me
> when dealing with the guest_memfd NUMA testcase, and was what prompted me to
> write this syscall framework.  This also shows the typical error message that a
> developer will write.
>
> This TEST_ASSERT() failed on me due to a misguided cleanup I made:
>
> 	ret = syscall(__NR_get_mempolicy, &get_policy, &get_nodemask,
> 		      maxnode, mem, MPOL_F_ADDR);
> 	TEST_ASSERT(!ret && get_policy == MPOL_DEFAULT && get_nodemask == 0,
> 		"Policy should be MPOL_DEFAULT and nodes zero");
>
> generating this error message:
>
>   ==== Test Assertion Failure ====
>   guest_memfd_test.c:120: !ret && get_policy == MPOL_DEFAULT && get_nodemask == 0
>   pid=52062 tid=52062 errno=22 - Invalid argument
>      1	0x0000000000404113: test_mbind at guest_memfd_test.c:120 (discriminator 6)
>      2	 (inlined by) __test_guest_memfd at guest_memfd_test.c:409 (discriminator 6)
>      3	0x0000000000402320: test_guest_memfd at guest_memfd_test.c:432
>      4	 (inlined by) main at guest_memfd_test.c:529
>      5	0x000000000041eda3: __libc_start_call_main at libc-start.o:?
>      6	0x0000000000420f4c: __libc_start_main_impl at ??:?
>      7	0x00000000004025c0: _start at ??:?
>   Policy should be MPOL_DEFAULT and nodes zero
>
> At first glance, it would appear that get_mempolicy() failed with -EINVAL.  Nope.
> ret==0, but errno was left set from an earlier syscall.  It took me a few minutes
> of digging and a run with strace to figure out that get_mempolicy() succeeded.
>
> Constrast that with:
>
>         kvm_get_mempolicy(&policy, &nodemask, maxnode, mem, MPOL_F_ADDR);
>         TEST_ASSERT(policy == MPOL_DEFAULT && !nodemask,
>                     "Wanted MPOL_DEFAULT (%u) and nodemask 0x0, got %u and 0x%lx",
>                     MPOL_DEFAULT, policy, nodemask);
>
>   ==== Test Assertion Failure ====
>   guest_memfd_test.c:120: policy == MPOL_DEFAULT && !nodemask
>   pid=52700 tid=52700 errno=22 - Invalid argument
>      1	0x0000000000404915: test_mbind at guest_memfd_test.c:120 (discriminator 6)
>      2	 (inlined by) __test_guest_memfd at guest_memfd_test.c:407 (discriminator 6)
>      3	0x0000000000402320: test_guest_memfd at guest_memfd_test.c:430
>      4	 (inlined by) main at guest_memfd_test.c:527
>      5	0x000000000041eda3: __libc_start_call_main at libc-start.o:?
>      6	0x0000000000420f4c: __libc_start_main_impl at ??:?
>      7	0x00000000004025c0: _start at ??:?
>   Wanted MPOL_DEFAULT (0) and nodemask 0x0, got 1 and 0x1
>
> Yeah, there's still some noise with errno=22, but it's fairly clear that the
> returned values mismatches, and super obvious that the syscall succeeded when
> looking at the code.  This is not a cherry-picked example.  There are hundreds,
> if not thousands, of such asserts in KVM selftests and KVM-Unit-Tests in
> particular.  And that's when developers _aren't_ forced to manually add boilerplate
> asserts in ioctls succeeding.
>
> For people that are completely new to KVM selftests, I can appreciate that it
> might take a while to acclimate to the foo() and __foo() pattern, but I have a
> hard time believing that it adds significant cognitive load after you've spent
> a decent amount of time in KVM selftests.  And I 100% want to cater to the people
> that are dealing with KVM selftests day in, day out.
>

Thanks for taking the time to write this up. I'm going to start a list
of "most useful explanations" and this will go on that list.

>> Or perhaps it should be more explicit, like in the name, that an
>> assertion is made within this function?
>
> No, that's entirely inflexible, will lead to confusion, and adds a copious amount
> of noise.  E.g. this
>
> 	/* emulate hypervisor clearing CR4.OSXSAVE */
> 	vcpu_sregs_get(vcpu, &sregs);
> 	sregs.cr4 &= ~X86_CR4_OSXSAVE;
> 	vcpu_sregs_set(vcpu, &sregs);
>
> versus
>
> 	/* emulate hypervisor clearing CR4.OSXSAVE */
> 	vcpu_sregs_get_assert(vcpu, &sregs);
> 	sregs.cr4 &= ~X86_CR4_OSXSAVE;
> 	vcpu_sregs_set_assert(vcpu, &sregs);
>
> The "assert" is pure noise and makes it harder to see the "get" versus "set".
>
> If we instead annotate the the "no_assert" case, then we'll end up with ambigous
> cases where a developer won't be able to determine if an unannotated API asserts
> or not, and conflict cases where a "no_assert" API _does_ assert, just not on the
> primary ioctl it's invoking.
>
> IMO, foo() and __foo() is quite explicit once you become accustomed to the
> environment.
>
>> In many cases a foo() exists without the corresponding __foo(), which
>> seems to be discouraging testing for error cases.
>
> That's almost always because no one has needed __foo().
>
>> Also, I guess especially for vcpu_run(), tests would like to loop/take
>> different actions based on different errnos and then it gets a bit
>> unwieldy to have to avoid functions that have assertions within them.
>
> vcpu_run() is a special case.  KVM_RUN is so much more than a normal ioctl, and
> so having vcpu_run() follow the "standard" pattern isn't entirely feasible.
>
> Speaking of vcpu_run(), and directly related to idea of having developers manually
> do TEST_ASSERT_EQ(), one of the top items on my selftests todo list is to have
> vcpu_run() handle GUEST_ASSERT and GUEST_PRINTF whenever possible.  Having to add
> UCALL_PRINTF handling just to get a debug message out of a test's guest code is
> beyond frustrating.  Ditto for the 60+ tests that had to manually add UCALL_ABORT
> handling, which leads to tests having code like this, which then gets copy+pasted
> all over the place and becomes a nightmare to maintain.

+1000 this is exactly where I had to avoid assertions!

>
> static void __vcpu_run_expect(struct kvm_vcpu *vcpu, unsigned int cmd)
> {
> 	struct ucall uc;
>
> 	vcpu_run(vcpu);
> 	switch (get_ucall(vcpu, &uc)) {
> 	case UCALL_ABORT:
> 		REPORT_GUEST_ASSERT(uc);
> 		break;
> 	default:
> 		if (uc.cmd == cmd)
> 			return;
>
> 		TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
> 	}
> }
>
>> I can see people forgetting to add TEST_ASSERT_EQ()s to check results of
>> setup/teardown functions but I think those errors would surface some
>> other way anyway.
>
> Heh, I don't mean to be condescending, but I highly doubt you'll have this
> opinion after you've had to debug a completely unfamiliar test that's failing
> in weird ways, for the tenth time.
>
>> Not a strongly-held opinion,
>
> As you may have noticed, I have extremely strong opinions in this area :-)
>
>> and no major concerns on the naming either. It's a selftest after all and
>> IIUC we're okay to have selftest interfaces change anyway?
>
> Yes, changes are fine.  It's the churn I want to avoid.
>
> Oh, and here's the "answer" to the TEST_ASSERT_EQ() failure:
>
>   ==== Test Assertion Failure ====
>   include/kvm_util.h:794: !ret
>   pid=43866 tid=43866 errno=22 - Invalid argument
>      1	0x0000000000415486: vcpu_sregs_set at kvm_util.h:794 (discriminator 4)
>      2	 (inlined by) vcpu_load_state at processor.c:1125 (discriminator 4)
>      3	0x0000000000402805: save_restore_vm at hyperv_evmcs.c:221
>      4	0x000000000040204d: main at hyperv_evmcs.c:286
>      5	0x000000000041dfc3: __libc_start_call_main at libc-start.o:?
>      6	0x000000000042016c: __libc_start_main_impl at ??:?
>      7	0x0000000000402220: _start at ??:?
>   KVM_SET_SREGS failed, rc: -1 errno: 22 (Invalid argument)