lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 10 Oct 2021 15:37:56 +0300
From:   Maxim Levitsky <>
To:     Sean Christopherson <>,
        Paolo Bonzini <>
Cc:     Vitaly Kuznetsov <>,
        Wanpeng Li <>,
        Jim Mattson <>,
        Joerg Roedel <>,,
Subject: Re: [PATCH 0/2] KVM: x86: Fix and cleanup for recent AVIC changes

On Fri, 2021-10-08 at 18:01 -0700, Sean Christopherson wrote:
> Belated "code review" for Maxim's recent series to rework the AVIC inhibit
> code.  Using the global APICv status in the page fault path is wrong as
> the correct status is always the vCPU's, since that status is accurate
> with respect to the time of the page fault.  In a similar vein, the code
> to change the inhibit can be cleaned up since KVM can't rely on ordering
> between the update and the request for anything except consumers of the
> request.
> Sean Christopherson (2):
>   KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS
>     memslot
>   KVM: x86: Simplify APICv update request logic
>  arch/x86/kvm/mmu/mmu.c |  2 +-
>  arch/x86/kvm/x86.c     | 16 +++++++---------
>  2 files changed, 8 insertions(+), 10 deletions(-)

Are you sure about it? Let me explain how the algorithm works:

- kvm_request_apicv_update:

	- take kvm->arch.apicv_update_lock

	- if inhibition state doesn't really change (kvm->arch.apicv_inhibit_reasons still zero or non zero)
		- update kvm->arch.apicv_inhibit_reasons
		- release the lock

		* since kvm->arch.apicv_update_lock is taken, all vCPUs will be kicked out of guest
		  mode and will be either doing someing in the KVM (like page fault) or stuck on trying to process that request
                  the important thing is that no vCPU will be able to get back to the guest mode.

	- update the kvm->arch.apicv_inhibit_reasons
		* since we hold vm->arch.apicv_update_lock vcpus can't see the new value

	- update the SPTE that covers the APIC's mmio window:

		- if we enable AVIC, then do nothing.
			* First vCPU to access it will page fault and populate that SPTE

			* If we race with page fault again no problem, worst case the page fault
			  doesn't populte the SPTE, and we will get another page fault later
			  and it will. 

			  -> SPTE not present + AVIC enabled is not a problem, it just causes
			  a spurious page fault, and then retried at which point AVIC is used.

			  It is nice to re-install the SPTE as fast as possible to avoid such
			  faults for performance reasons.

		- if we disable AVIC, then we zap the spte:

			* page fault should not happen just before zapping as AVIC is enabled on the vCPUs now.
			  even if it does happen, it doesn't matter if it does populate the SPTE, as we will zap it anyway.

			* during the zapping we take the mmu lock and use mmu notifier counter hack
			  to avoid racing with page fault that can happen concurrently with it.

			* if page fault on another vCPU happens after the zapping, it will see the correct 
			  kvm->arch.apicv_inhibit_reasons (but likely incorrect its own vCPU AVIC inhibit state)
			  and will not re-populate the SPTE.

			  -> and SPTE present + AVIC inhibited on this vCPU is the problem,
			  as this will cause writes to AVIC to disappear into that dummy page mapped by that SPTE.

			  That is why patch 1 IMHO is wrong.

	- release the kvm->arch.apicv_update_lock
		* at that point all vCPUs can re-enter but they all will process the KVM_REQ_APICV_UPDATE
		  prior to that, which will update their AVIC state.

Best regards,
	Maxim Levitsky

Powered by blists - more mailing lists