linux-kernel - Re: [PATCH 2/3] KVM: x86: Fix a semi theoretical bug in kvm_arch_async_page_present

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aNMpz96c9JOtPh-w@google.com>
Date: Tue, 23 Sep 2025 16:14:23 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Maxim Levitsky <mlevitsk@...hat.com>, kvm@...r.kernel.org, 
	Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, 
	Ingo Molnar <mingo@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org, 
	Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] KVM: x86: Fix a semi theoretical bug in kvm_arch_async_page_present_queued

On Tue, Sep 23, 2025, Paolo Bonzini wrote:
> On 9/23/25 20:55, Sean Christopherson wrote:
> > On Tue, Sep 23, 2025, Paolo Bonzini wrote:
> > > On 8/13/25 21:23, Maxim Levitsky wrote:
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 9018d56b4b0a..3d45a4cd08a4 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -13459,9 +13459,14 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
> > > >    void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu)
> > > >    {
> > > > -	kvm_make_request(KVM_REQ_APF_READY, vcpu);
> > > > -	if (!vcpu->arch.apf.pageready_pending)
> > > > +	/* Pairs with smp_store_release in vcpu_enter_guest. */
> > > > +	bool in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);
> > > > +	bool page_ready_pending = READ_ONCE(vcpu->arch.apf.pageready_pending);
> > > > +
> > > > +	if (!in_guest_mode || !page_ready_pending) {
> > > > +		kvm_make_request(KVM_REQ_APF_READY, vcpu);
> > > >    		kvm_vcpu_kick(vcpu);
> > > > +	}
> > > 
> > > Unlike Sean, I think the race exists in abstract and is not benign
> > 
> > How is it not benign?  I never said the race doesn't exist, I said that consuming
> > a stale vcpu->arch.apf.pageready_pending in kvm_arch_async_page_present_queued()
> > is benign.
> 
> In principle there is a possibility that a KVM_REQ_APF_READY is missed.

I think you mean a kick (wakeup or IPI), is missed, not that the APF_READY itself
is missed.  I.e. KVM_REQ_APF_READY will never be lost, KVM just might enter the
guest or schedule out the vCPU with the flag set.

All in all, I think we're in violent agreement.  I agree that kvm_vcpu_kick()
could be missed (theoretically), but I'm saying that missing the kick would be
benign due to a myriad of other barriers and checks, i.e. that the vCPU is
guaranteed to see KVM_REQ_APF_READY anyways.

E.g. my suggestion earlier regarding OUTSIDE_GUEST_MODE was to rely on the
smp_mb__after_srcu_read_{,un}lock() barriers in vcpu_enter_guest() to ensure
KVM_REQ_APF_READY would be observed before trying VM-Enter, and that if KVM might
be in the process of emulating HLT (blocking), that either KVM_REQ_APF_READY is
visible to the vCPU or that kvm_arch_async_page_present() wakes the vCPU.  Oh,
hilarious, async_pf_execute() also does an unconditional __kvm_vcpu_wake_up().

Huh.  But isn't that a real bug?  KVM doesn't consider KVM_REQ_APF_READY to be a
wake event, so isn't this an actual race?

  vCPU                                  async #PF
  kvm_check_async_pf_completion()
  pageready_pending = false
  VM-Enter
  HLT
  VM-Exit
                                        kvm_make_request(KVM_REQ_APF_READY, vcpu)
                                        kvm_vcpu_kick(vcpu)  // nop as the vCPU isn't blocking, yet
                                        __kvm_vcpu_wake_up() // nop for the same reason
  vcpu_block()
  <hang>

On x86, the "page ready" IRQ is only injected from vCPU context, so AFAICT nothing
is guarnateed wake the vCPU in the above sequence.

> broken:
> 
> 	kvm_make_request(KVM_REQ_APF_READY, vcpu);
> 	if (!vcpu->arch.apf.pageready_pending)
>    		kvm_vcpu_kick(vcpu);
> 
> It won't happen because set_bit() is written with asm("memory"), because x86
> set_bit() does prevent reordering at the processor level, etc.
> 
> In other words the race is only avoided by the fact that compiler
> reorderings are prevented even in cases that memory-barriers.txt does not
> promise.