[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f8b0d49-f81c-4716-a60f-2a0d7042badd@redhat.com>
Date: Tue, 23 Sep 2025 18:23:54 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Maxim Levitsky <mlevitsk@...hat.com>, kvm@...r.kernel.org
Cc: Sean Christopherson <seanjc@...gle.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
Ingo Molnar <mingo@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
x86@...nel.org, Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] KVM: x86: Fix a semi theoretical bug in
kvm_arch_async_page_present_queued
On 9/23/25 17:58, Paolo Bonzini wrote:
> - on the other side, after clearing page_ready_pending, there will be a
> check for a wakeup:
>
> WRITE_ONCE(page_ready_pending, false);
> smp_mb();
> if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
> kvm_check_async_pf_completion(vcpu)
>
> except that the "if" is not in kvm_set_msr_common(); it will happen
> naturally as part of the first re-entry.
Important thing that I forgot to mention: the above only covers the race
case. There's also the case where KVM_REQ_APF_READY has been cleared
already, and for that one the call to kvm_check_async_pf_completion() is
*also* needed in kvm_set_msr_common().
Paolo
>
> So let's look at the changes you need to make, in order to make the code
> look like the above.
>
> - using READ_ONCE/WRITE_ONCE for pageready_pending never hurts
>
> - here in kvm_arch_async_page_present_queued(), a smp_mb__after_atomic()
> (compiler barrier on x86) is missing after kvm_make_request():
>
> kvm_make_request(KVM_REQ_APF_READY, vcpu);
> /*
> * Tell vCPU to wake up before checking if they need an
> * interrupt. Pairs with any memory barrier between
> * the clearing of pageready_pending and vCPU entry.
> */
> smp_mb__after_atomic();
> if (!READ_ONCE(vcpu->arch.apf.pageready_pending))
> kvm_vcpu_kick(vcpu);
>
> - in kvm_set_msr_common(), there are two possibilities.
> The easy one is to just use smp_store_mb() to clear
> vcpu->arch.apf.pageready_pending. The other would be a comment
> like this:
>
> WRITE_ONCE(vcpu->arch.apf.pageready_pending, false);
> /*
> * Ensure they know to wake this vCPU up, before the vCPU
> * next checks KVM_REQ_APF_READY. Use an existing memory
> * barrier between here and thenext kvm_request_pending(),
> * for example in vcpu_run().
> */
> /* smp_mb(); */
>
> plus a memory barrier in common code like this:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 706b6fd56d3c..e302c617e4b2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11236,6 +11236,11 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> if (r <= 0)
> break;
>
> + /*
> + * Provide a memory barrier between handle_exit and the
> + * kvm_request_pending() read in vcpu_enter_guest(). It
> + * pairs with any barrier after kvm_make_request(), for
> + * example in kvm_arch_async_page_present_queued().
> + */
> + smp_mb__before_atomic();
> kvm_clear_request(KVM_REQ_UNBLOCK, vcpu);
> if (kvm_xen_has_pending_events(vcpu))
> kvm_xen_inject_pending_events(vcpu);
>
>
> The only advantage of this second, more complex approach is that
> it shows *why* the race was not happening. The 50 clock cycles
> saved on an MSR write are not worth the extra complication, and
> on a quick grep I could not find other cases which rely on the same
> implicit barriers. So I'd say use smp_store_mb(), with a comment
> about the pairing with kvm_arch_async_page_present_queued(); and write
> in the commit message that the race wasn't happening thanks to unrelated
> memory barriers between handle_exit and the kvm_request_pending()
> read in vcpu_enter_guest.
>
> Thanks,
>
> Paolo
>
>
Powered by blists - more mailing lists