linux-kernel - Re: [PATCH 2/3] KVM: x86: Fix a semi theoretical bug in kvm_arch_async_page_present

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4f8b0d49-f81c-4716-a60f-2a0d7042badd@redhat.com>
Date: Tue, 23 Sep 2025 18:23:54 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Maxim Levitsky <mlevitsk@...hat.com>, kvm@...r.kernel.org
Cc: Sean Christopherson <seanjc@...gle.com>,
 Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
 Ingo Molnar <mingo@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
 x86@...nel.org, Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] KVM: x86: Fix a semi theoretical bug in
 kvm_arch_async_page_present_queued

On 9/23/25 17:58, Paolo Bonzini wrote:
> - on the other side, after clearing page_ready_pending, there will be a
> check for a wakeup:
> 
>    WRITE_ONCE(page_ready_pending, false);
>    smp_mb();
>    if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
>      kvm_check_async_pf_completion(vcpu)
> 
> except that the "if" is not in kvm_set_msr_common(); it will happen
> naturally as part of the first re-entry.

Important thing that I forgot to mention: the above only covers the race 
case.  There's also the case where KVM_REQ_APF_READY has been cleared 
already, and for that one the call to kvm_check_async_pf_completion() is 
*also* needed in kvm_set_msr_common().

Paolo

> 
> So let's look at the changes you need to make, in order to make the code
> look like the above.
> 
> - using READ_ONCE/WRITE_ONCE for pageready_pending never hurts
> 
> - here in kvm_arch_async_page_present_queued(), a smp_mb__after_atomic()
> (compiler barrier on x86) is missing after kvm_make_request():
> 
>          kvm_make_request(KVM_REQ_APF_READY, vcpu);
>      /*
>       * Tell vCPU to wake up before checking if they need an
>       * interrupt.  Pairs with any memory barrier between
>       * the clearing of pageready_pending and vCPU entry.
>       */
>      smp_mb__after_atomic();
>          if (!READ_ONCE(vcpu->arch.apf.pageready_pending))
>                  kvm_vcpu_kick(vcpu);
> 
> - in kvm_set_msr_common(), there are two possibilities.
> The easy one is to just use smp_store_mb() to clear
> vcpu->arch.apf.pageready_pending.  The other would be a comment
> like this:
> 
>      WRITE_ONCE(vcpu->arch.apf.pageready_pending, false);
>      /*
>       * Ensure they know to wake this vCPU up, before the vCPU
>       * next checks KVM_REQ_APF_READY.  Use an existing memory
>       * barrier between here and thenext kvm_request_pending(),
>       * for example in vcpu_run().
>       */
>      /* smp_mb(); */
> 
> plus a memory barrier in common code like this:
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 706b6fd56d3c..e302c617e4b2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11236,6 +11236,11 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>           if (r <= 0)
>               break;
> 
> +        /*
> +         * Provide a memory barrier between handle_exit and the
> +         * kvm_request_pending() read in vcpu_enter_guest().  It
> +         * pairs with any barrier after kvm_make_request(), for
> +         * example in kvm_arch_async_page_present_queued().
> +         */
> +        smp_mb__before_atomic();
>           kvm_clear_request(KVM_REQ_UNBLOCK, vcpu);
>           if (kvm_xen_has_pending_events(vcpu))
>               kvm_xen_inject_pending_events(vcpu);
> 
> 
> The only advantage of this second, more complex approach is that
> it shows *why* the race was not happening.  The 50 clock cycles
> saved on an MSR write are not worth the extra complication, and
> on a quick grep I could not find other cases which rely on the same
> implicit barriers.  So I'd say use smp_store_mb(), with a comment
> about the pairing with kvm_arch_async_page_present_queued(); and write
> in the commit message that the race wasn't happening thanks to unrelated
> memory barriers between handle_exit and the kvm_request_pending()
> read in vcpu_enter_guest.
> 
> Thanks,
> 
> Paolo
> 
>