linux-kernel - Re: [PATCH v2 2/2] KVM: x86/mmu: include efer.lma in extended mmu role

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YZLGrFZtYhERjIcH@google.com>
Date:   Mon, 15 Nov 2021 20:44:28 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Maxim Levitsky <mlevitsk@...hat.com>
Cc:     kvm@...r.kernel.org, Vitaly Kuznetsov <vkuznets@...hat.com>,
        Joerg Roedel <joro@...tes.org>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" 
        <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jim Mattson <jmattson@...gle.com>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH v2 2/2] KVM: x86/mmu: include efer.lma in extended mmu
 role

On Mon, Nov 15, 2021, Maxim Levitsky wrote:
> When the host is running with normal TDP mmu (EPT/NPT),
> and it is running a nested 32 bit guest, then after a migration,
> the host mmu (aka root_mmu) is first initialized with
> nested guest's IA32_EFER, due to the way userspace restores
> the nested state.

Please try to avoid unnecessary newlines, I find it quite difficult to read as
my eyeballs need to jump around more.  E.g. wrapping at 75 chars yields

  When the host is running with normal TDP mmu (EPT/NPT), and it is running
  a nested 32 bit guest, then after a migration, the host mmu (aka root_mmu)
  is first initialized with nested guest's IA32_EFER, due to the way
  userspace restores the nested state.

  When later, this is corrected on first nested VM exit to the host, when
  host EFER is loaded from vmcs12, the root_mmu is not reset, because the
  role.base.level in this case, reflects the level of the TDP mmu which is
  always 4 (or 5) on EPT, and usually 4 or even 5 on AMD (when we have
  64-bit host).

  Since most of the paging state is already captured in the extended mmu
  role, just add the EFER.LMA there to force that reset.

> When later, this is corrected on first nested VM exit to the host,
> when host EFER is loaded from vmcs12,
> the root_mmu is not reset, because the role.base.level
> in this case, reflects the level of the TDP mmu which is
> always 4 (or 5) on EPT, and usually 4 or even 5 on AMD
> (when we have 64 bit host).
>
> Since most of the paging state is already captured in
> the extended mmu role, just add the EFER.LMA there to
> force that reset.

Similar to patch 1, I'd like to word the changelog to make it very clear that this
fix is _necessary_, not just a hack to fudge around QEMU behavior.  I've spent far
too much time deciphering historical KVM changelogs along the lines of "QEMU does
XYZ, change KVM to handle that", and in more than one case the "fix" has been wrong
and/or incomplete.

  Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
  guest root level and is not reflected in kvm_mmu_page_role.level when TDP
  is in use.  When simply running the guest, it is impossible for EFER.LMA
  and kvm_mmu.root_level to get out of sync, as the guest cannot transition
  from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
  first bouncing through a different MMU context.  And stuffing guest state
  via KVM_SET_SREGS{2} also ensures a full MMU context reset.

  However, if KVM_SET_SREGS{2} is followed by KVM_SET_NESTED_STATE, e.g. to
  set guest state when migrating the VM while L2 is active, the vCPU state
  will reflect L2, not L1.  If L1 is using TDP for L2, then root_mmu will
  have been configured using L2's state, despite not being used for L2.  If
  L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
  be configured for guest PAE paging, but will match the mmu_role for 64-bit
  paging and cause KVM to not   reconfigured root_mmu on the next nested
  VM-Exit.

And after typing that up, it's probably also worth adding a blurb to call out (and
argue against) the alternative.

  Alternatively, the root_mmu's role could be invalidated after a successful
  KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
  i.e. that switches the active mmu to guest_mmu, but doing so would force
  KVM to reconfigure the root_mmu in the common case where L1 and L2 have
  the same EFER, e.g. are both 64-bit guests.

> Suggested-by: Sean Christopherson <seanjc@...gle.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/mmu/mmu.c          | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 88fce6ab4bbd7..a44b9eb7d4d6d 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -364,6 +364,7 @@ union kvm_mmu_extended_role {
>  		unsigned int cr4_smap:1;
>  		unsigned int cr4_smep:1;
>  		unsigned int cr4_la57:1;
> +		unsigned int efer_lma:1;
>  	};
>  };
>  
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 354d2ca92df4d..5c4a41697a717 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4682,6 +4682,7 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu,
>  		/* PKEY and LA57 are active iff long mode is active. */
>  		ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs);
>  		ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs);
> +		ext.efer_lma = ____is_efer_lma(regs);
>  	}
>  
>  	ext.valid = 1;
> -- 
> 2.26.3
>