linux-kernel - Re: [PATCH] KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZcHzrqpoUlUl5OhU@x1n>
Date: Tue, 6 Feb 2024 16:54:06 +0800
From: Peter Xu <peterx@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org, Shaoqin Huang <shahuang@...hat.com>
Subject: Re: [PATCH] KVM: selftests: Fix a semaphore imbalance in the dirty
 ring logging test

On Fri, Feb 02, 2024 at 03:18:31PM -0800, Sean Christopherson wrote:
> When finishing the final iteration of dirty_log_test testcase, set
> host_quit _before_ the final "continue" so that the vCPU worker doesn't
> run an extra iteration, and delete the hack-a-fix of an extra "continue"
> from the dirty ring testcase.  This fixes a bug where the extra post to
> sem_vcpu_cont may not be consumed, which results in failures in subsequent
> runs of the testcases.  The bug likely was missed during development as
> x86 supports only a single "guest mode", i.e. there aren't any subsequent
> testcases after the dirty ring test, because for_each_guest_mode() only
> runs a single iteration.
> 
> For the regular dirty log testcases, letting the vCPU run one extra
> iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and
> only if the worker is explicitly told to stop (vcpu_sync_stop_requested).
> But for the dirty ring test, which needs to periodically stop the vCPU to
> reap the dirty ring, letting the vCPU resume the guest _after_ the last
> iteration means the vCPU will get stuck without an extra "continue".
> 
> However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to
> be consumed, e.g. if the vCPU worker sees host_quit==true before resuming
> the guest.  This results in a dangling sem_vcpu_cont, which leads to
> subsequent iterations getting out of sync, as the vCPU worker will
> continue on before the main task is ready for it to resume the guest,
> leading to a variety of asserts, e.g.
> 
>   ==== Test Assertion Failure ====
>   dirty_log_test.c:384: dirty_ring_vcpu_ring_full
>   pid=14854 tid=14854 errno=22 - Invalid argument
>      1  0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
>      2  0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
>      3   (inlined by) run_test at dirty_log_test.c:802
>      4  0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
>      5  0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
>      6  0x0000ffff9be173c7: ?? ??:0
>      7  0x0000ffff9be1749f: ?? ??:0
>      8  0x000000000040206f: _start at ??:?
>   Didn't continue vcpu even without ring full
> 
> Alternatively, the test could simply reset the semaphores before each
> testcase, but papering over hacks with more hacks usually ends in tears.

IMHO this is fine; we don't have a hard requirement anyway on sem in this
test, and we're pretty sure where the extra sem count came from (rather
than an unknown cause).

However since the patch can drop before_vcpu_join() by a proper ordering of
setup host_quit v.s. sem_post(), I think it's at least equally nice indeed.

> 
> Reported-by: Shaoqin Huang <shahuang@...hat.com>
> Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test")
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>

Reviewed-by: Peter Xu <peterx@...hat.com>

Thanks,

-- 
Peter Xu