lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88c65b89-5174-4076-82cd-7852c8c25b66@intel.com>
Date: Tue, 11 Jun 2024 16:07:12 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <isaku.yamahata@...el.com>, <pbonzini@...hat.com>,
	<erdemaktas@...gle.com>, <vkuznets@...hat.com>, <vannapurve@...gle.com>,
	<jmattson@...gle.com>, <mlevitsk@...hat.com>, <xiaoyao.li@...el.com>,
	<chao.gao@...el.com>, <rick.p.edgecombe@...el.com>, <yuan.yao@...el.com>,
	<kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH V8 1/2] KVM: selftests: Add x86_64 guest udelay() utility

Hi Sean,

On 6/11/24 3:03 PM, Sean Christopherson wrote:
> On Tue, Jun 11, 2024, Reinette Chatre wrote:
>>> Heh, the docs are stale.  KVM hasn't returned an error since commit cc578287e322
>>> ("KVM: Infrastructure for software and hardware based TSC rate scaling"), which
>>> again predates selftests by many years (6+ in this case).  To make our lives
>>> much simpler, I think we should assert that KVM_GET_TSC_KHZ succeeds, and maybe
>>> throw in a GUEST_ASSERT(thz_khz) in udelay()?
>>
>> I added the GUEST_ASSERT() but I find that it comes with a caveat (more below).
>>
>> I plan an assert as below that would end up testing the same as what a
>> GUEST_ASSERT(tsc_khz) would accomplish:
>>
>> 	r = __vm_ioctl(vm, KVM_GET_TSC_KHZ, NULL);
>> 	TEST_ASSERT(r > 0, "KVM_GET_TSC_KHZ did not provide a valid TSC freq.");
>> 	tsc_khz = r;
>>
>>
>> Caveat is: the additional GUEST_ASSERT() requires all tests that use udelay() in
>> the guest to now subtly be required to implement a ucall (UCALL_ABORT) handler.
>> I did a crude grep check to see and of the 69 x86_64 tests there are 47 that do
>> indeed have a UCALL_ABORT handler. If any of the other use udelay() then the
>> GUEST_ASSERT() will of course still trigger, but will be quite cryptic. For
>> example, "Unhandled exception '0xe' at guest RIP '0x0'" vs. "tsc_khz".
> 
> Yeah, we really need to add a bit more infrastructure, there is way, way, waaaay
> too much boilerplate needed just to run a guest and handle the basic ucalls.
> Reporting guest asserts should Just Work for 99.9% of tests.
> 
> Anyways, is it any less cryptic if ucall_assert() forces a failure?  I forget if
> the problem with an unhandled GUEST_ASSERT() is that the test re-enters the guest,
> or if it's something else.
> 
> I don't think we need a perfect solution _now_, as tsc_khz really should never
> be 0, just something to not make life completely miserable for future developers.
> 
> diff --git a/tools/testing/selftests/kvm/lib/ucall_common.c b/tools/testing/selftests/kvm/lib/ucall_common.c
> index 42151e571953..1116bce5cdbf 100644
> --- a/tools/testing/selftests/kvm/lib/ucall_common.c
> +++ b/tools/testing/selftests/kvm/lib/ucall_common.c
> @@ -98,6 +98,8 @@ void ucall_assert(uint64_t cmd, const char *exp, const char *file,
>   
>          ucall_arch_do_ucall((vm_vaddr_t)uc->hva);
>   
> +       ucall_arch_do_ucall(GUEST_UCALL_FAILED);
> +
>          ucall_free(uc);
>   }
> 

Thank you very much.

With your suggestion an example unhandled GUEST_ASSERT() looks as below.
It does not guide on what (beyond vcpu_run()) triggered the assert but it
indeed provides a hint that adding ucall handling may be needed.

[SNIP]
==== Test Assertion Failure ====
   lib/ucall_common.c:154: addr != (void *)GUEST_UCALL_FAILED
   pid=16002 tid=16002 errno=4 - Interrupted system call
      1  0x000000000040da91: get_ucall at ucall_common.c:154
      2  0x0000000000410142: assert_on_unhandled_exception at processor.c:614
      3  0x0000000000406590: _vcpu_run at kvm_util.c:1718
      4   (inlined by) vcpu_run at kvm_util.c:1729
      5  0x00000000004026cf: test_apic_bus_clock at apic_bus_clock_test.c:115
      6   (inlined by) run_apic_bus_clock_test at apic_bus_clock_test.c:164
      7   (inlined by) main at apic_bus_clock_test.c:201
      8  0x00007fb1d8429d8f: ?? ??:0
      9  0x00007fb1d8429e3f: ?? ??:0
     10  0x00000000004027a4: _start at ??:?
   Guest failed to allocate ucall struct
[SNIP]

Is this acceptable? I can add a new preparatory patch with your
suggestion that has as its goal to provide slightly better error message
when there is an unhandled ucall.

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ