[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <61c01b21-7422-4bcb-895d-57b0eb07b5ff@intel.com>
Date: Wed, 6 Mar 2024 13:43:34 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Sean Christopherson <seanjc@...gle.com>, Gerd Hoffmann <kraxel@...hat.com>
Cc: kvm@...r.kernel.org, Tom Lendacky <thomas.lendacky@....com>,
Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>,
"open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] kvm: set guest physical bits in CPUID.0x80000008
On 3/6/2024 12:35 AM, Sean Christopherson wrote:
> KVM: x86:
>
> On Tue, Mar 05, 2024, Gerd Hoffmann wrote:
>> Set CPUID.0x80000008:EAX[23:16] to guest phys bits, i.e. the bits which
>> are actually addressable. In most cases this is identical to the host
>> phys bits, but tdp restrictions (no 5-level paging) can limit this to
>> 48.
>>
>> Quoting AMD APM (revision 3.35):
>>
>> 23:16 GuestPhysAddrSize Maximum guest physical address size in bits.
>> This number applies only to guests using nested
>> paging. When this field is zero, refer to the
>> PhysAddrSize field for the maximum guest
>> physical address size. See “Secure Virtual
>> Machine” in APM Volume 2.
>>
>> Tom Lendacky confirmed the purpose of this field is software use,
>> hardware always returns zero here.
>>
>> Signed-off-by: Gerd Hoffmann <kraxel@...hat.com>
>> ---
>> arch/x86/kvm/mmu.h | 2 ++
>> arch/x86/kvm/cpuid.c | 3 ++-
>> arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++++
>> 3 files changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 60f21bb4c27b..42b5212561c8 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -100,6 +100,8 @@ static inline u8 kvm_get_shadow_phys_bits(void)
>> return boot_cpu_data.x86_phys_bits;
>> }
>>
>> +int kvm_mmu_get_guest_phys_bits(void);
>> +
>> void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
>> void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
>> void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index adba49afb5fe..12037f1b017e 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -1240,7 +1240,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>> else if (!g_phys_as)
>
> Based on the new information that GuestPhysAddrSize is software-defined, and the
> fact that KVM and QEMU are planning on using GuestPhysAddrSize to communicate
> the maximum *addressable* GPA, deriving PhysAddrSize from GuestPhysAddrSize is
> wrong.
>
> E.g. if KVM is running as L1 on top of a new KVM, on a CPU with MAXPHYADDR=52,
> and on a CPU without 5-level TDP, then KVM (as L1) will see:
>
> PhysAddrSize = 52
> GuestPhysAddrSize = 48
>
> Propagating GuestPhysAddrSize to PhysAddrSize (which is confusingly g_phys_as)
> will yield an L2 with
>
> PhysAddrSize = 48
> GuestPhysAddrSize = 48
>
> which is broken, because GPAs with bits 51:48!=0 are *legal*, but not addressable.
>
>> g_phys_as = phys_as;
>>
>> - entry->eax = g_phys_as | (virt_as << 8);
>> + entry->eax = g_phys_as | (virt_as << 8)
>> + | kvm_mmu_get_guest_phys_bits() << 16;
>
> The APM explicitly states that GuestPhysAddrSize only applies to NPT. KVM should
> follow suit to avoid creating unnecessary ABI, and because KVM can address any
> legal GPA when using shadow paging.
>
>> entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8));
>> entry->edx = 0;
>> cpuid_entry_override(entry, CPUID_8000_0008_EBX);
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 2d6cdeab1f8a..8bebb3e96c8a 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -5267,6 +5267,21 @@ static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
>> return max_tdp_level;
>> }
>>
>> +/*
>> + * return the actually addressable guest phys bits, which might be
>> + * less than host phys bits due to tdp restrictions.
>> + */
>> +int kvm_mmu_get_guest_phys_bits(void)
>> +{
>> + if (tdp_enabled && shadow_phys_bits > 48) {
>> + if (tdp_root_level && tdp_root_level != PT64_ROOT_5LEVEL)
>> + return 48;
>> + if (max_tdp_level != PT64_ROOT_5LEVEL)
>> + return 48;
>
> I would prefer to not use shadow_phys_bits to cap the reported CPUID.0x8000_0008,
> so that the logic isn't spread across the CPUID code and the MMU. I don't love
> that the two have duplicate logic, but there's no great way to handle that since
> the MMU needs to be able to determine the effective host MAXPHYADDR even if
> CPUID.0x8000_0008 is unsupported.
>
> I'm thinking this, maybe spread across two patches: one to undo KVM's usage of
> GuestPhysAddrSize, and a second to then set GuestPhysAddrSize for userspace?
Below code looks good to me. And make it into two patches makes sense.
> ---
> arch/x86/kvm/cpuid.c | 38 ++++++++++++++++++++++++++++----------
> arch/x86/kvm/mmu.h | 2 ++
> arch/x86/kvm/mmu/mmu.c | 5 +++++
> 3 files changed, 35 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index adba49afb5fe..ae03e69d7fb9 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1221,9 +1221,18 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> entry->eax = entry->ebx = entry->ecx = 0;
> break;
> case 0x80000008: {
> - unsigned g_phys_as = (entry->eax >> 16) & 0xff;
> - unsigned virt_as = max((entry->eax >> 8) & 0xff, 48U);
> - unsigned phys_as = entry->eax & 0xff;
> + unsigned int virt_as = max((entry->eax >> 8) & 0xff, 48U);
> +
> + /*
> + * KVM's ABI is to report the effective MAXPHYADDR for the guest
> + * in PhysAddrSize (phys_as), and the maximum *addressable* GPA
> + * in GuestPhysAddrSize (g_phys_as). GuestPhysAddrSize is valid
> + * if and only if TDP is enabled, in which case the max GPA that
> + * can be addressed by KVM may be less than the max GPA that can
> + * be legally generated by the guest, e.g. if MAXPHYADDR>48 but
> + * the CPU doesn't support 5-level TDP.
> + */
> + unsigned int phys_as, g_phys_as;
>
> /*
> * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as
> @@ -1231,16 +1240,25 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> * reductions in MAXPHYADDR for memory encryption affect shadow
> * paging, too.
> *
> - * If TDP is enabled but an explicit guest MAXPHYADDR is not
> - * provided, use the raw bare metal MAXPHYADDR as reductions to
> - * the HPAs do not affect GPAs.
> + * If TDP is enabled, the effective guest MAXPHYADDR is the same
> + * as the raw bare metal MAXPHYADDR, as reductions to HPAs don't
> + * affect GPAs. The max addressable GPA is the same as the max
> + * effective GPA, except that it's capped at 48 bits if 5-level
> + * TDP isn't supported (hardware processes bits 51:48 only when
> + * walking the fifth level page table).
> */
> - if (!tdp_enabled)
> - g_phys_as = boot_cpu_data.x86_phys_bits;
> - else if (!g_phys_as)
> + if (!tdp_enabled) {
> + phys_as = boot_cpu_data.x86_phys_bits;
> + g_phys_as = 0;
> + } else {
> + phys_as = entry->eax & 0xff;
> g_phys_as = phys_as;
>
> - entry->eax = g_phys_as | (virt_as << 8);
> + if (kvm_mmu_get_max_tdp_level() < 5)
> + g_phys_as = min(g_phys_as, 48);
> + }
> +
> + entry->eax = phys_as | (virt_as << 8) | (g_phys_as << 16);
> entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8));
> entry->edx = 0;
> cpuid_entry_override(entry, CPUID_8000_0008_EBX);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 60f21bb4c27b..b410a227c601 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -100,6 +100,8 @@ static inline u8 kvm_get_shadow_phys_bits(void)
> return boot_cpu_data.x86_phys_bits;
> }
>
> +u8 kvm_mmu_get_max_tdp_level(void);
> +
> void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
> void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
> void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 2d6cdeab1f8a..ffd32400fd8c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5267,6 +5267,11 @@ static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
> return max_tdp_level;
> }
>
> +u8 kvm_mmu_get_max_tdp_level(void)
> +{
> + return tdp_root_level ? tdp_root_level : max_tdp_level;
> +}
> +
> static union kvm_mmu_page_role
> kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
> union kvm_cpu_role cpu_role)
>
> base-commit: c0372e747726ce18a5fba8cdc71891bd795148f6
Powered by blists - more mailing lists