linux-kernel - Re: [PATCH v2 5/8] KVM: x86/mmu: Use separate namespaces for guest PTEs and shadow PTEs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YqnsAIKxQniAsb1d@google.com>
Date:   Wed, 15 Jun 2022 14:26:08 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Lai Jiangshan <jiangshan.ljs@...group.com>
Subject: Re: [PATCH v2 5/8] KVM: x86/mmu: Use separate namespaces for guest
 PTEs and shadow PTEs

On Wed, Jun 15, 2022, Paolo Bonzini wrote:
> On 6/15/22 01:33, Sean Christopherson wrote:
> > Separate the macros for KVM's shadow PTEs (SPTE) from guest 64-bit PTEs
> > (PT64).  SPTE and PT64 are _mostly_ the same, but the few differences are
> > quite critical, e.g. *_BASE_ADDR_MASK must differentiate between host and
> > guest physical address spaces, and SPTE_PERM_MASK (was PT64_PERM_MASK) is
> > very much specific to SPTEs.
> > 
> > Opportunistically (and temporarily) move most guest macros into paging.h
> > to clearly associate them with shadow paging, and to ensure that they're
> > not used as of this commit.  A future patch will eliminate them entirely.
> > 
> > Sadly, PT32_LEVEL_BITS is left behind in mmu_internal.h because it's
> > needed for the quadrant calculation in kvm_mmu_get_page().  The quadrant
> > calculation is hot enough (when using shadow paging with 32-bit guests)
> > that adding a per-context helper is undesirable, and burying the
> > computation in paging_tmpl.h with a forward declaration isn't exactly an
> > improvement.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> 
> A better try:

Very nice!  It's obvious once someone else writes the code.  :-)

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 54b3e39d07b3..cd561b49cc84 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2011,8 +2011,21 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  	role.direct = direct;
>  	role.access = access;
>  	if (role.has_4_byte_gpte) {
> +		/*
> +		 * The "quadrant" value corresponds to those bits of the address
> +		 * that have already been used by the 8-byte shadow page table
> +		 * lookup, but not yet in the 4-byte guest page tables.  Having
> +		 * the quadrant as part of the role ensures that each upper sPTE
> +		 * points to the the correct portion of the guest page table
> +		 * structure.
> +		 *
> +		 * For example, a 4-byte PDE consumes bits 31:22 and an 8-byte PDE
> +		 * consumes bits 29:21.  Each guest PD must be expanded into four
> +		 * shadow PDs, one for each value of bits 31:30, and the PDPEs
> +		 * will use the four quadrants in round-robin fashion.

It's not round-robin, that would imply KVM rotates through each quadrant on its
own.  FWIW, I like David's comment from his patch that simplifies this mess in a
similar way.

https://lore.kernel.org/all/20220516232138.1783324-5-dmatlack@google.com
> +		 */
>  		quadrant = gaddr >> (PAGE_SHIFT + (SPTE_LEVEL_BITS * level));
> -		quadrant &= (1 << ((PT32_LEVEL_BITS - SPTE_LEVEL_BITS) * level)) - 1;
> +		quadrant &= (1 << level) - 1;
>  		role.quadrant = quadrant;
>  	}
>  	if (level <= vcpu->arch.mmu->cpu_role.base.level)
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index cb9d4d358335..5e1e3c8f8aaa 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -20,9 +20,6 @@ extern bool dbg;
>  #define MMU_WARN_ON(x) do { } while (0)
>  #endif
> -/* The number of bits for 32-bit PTEs is to needed compute the quandrant. */

Heh, and it gets rid of my typo.

> -#define PT32_LEVEL_BITS 10
> -
>  /* Page table builder macros common to shadow (host) PTEs and guest PTEs. */
>  #define __PT_LEVEL_SHIFT(level, bits_per_level)	\
>  	(PAGE_SHIFT + ((level) - 1) * (bits_per_level))
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 6c29aef4092b..e4655056e651 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -38,7 +38,7 @@
>  	#define pt_element_t u32
>  	#define guest_walker guest_walker32
>  	#define FNAME(name) paging##32_##name
> -	#define PT_LEVEL_BITS PT32_LEVEL_BITS
> +	#define PT_LEVEL_BITS 10
>  	#define PT_MAX_FULL_LEVELS 2
>  	#define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT
>  	#define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT
> 
> Paolo
>