[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1ec8527-322d-4bdb-9a38-145fd9f28e4b@linux.intel.com>
Date: Thu, 21 Aug 2025 13:23:55 +0800
From: Binbin Wu <binbin.wu@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>,
Vishal Annapurve <vannapurve@...gle.com>
Cc: Nikolay Borisov <nik.borisov@...e.com>, Jianxiong Gao <jxgao@...gle.com>,
"Borislav Petkov (AMD)" <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Dionna Glaze <dionnaglaze@...gle.com>, "H. Peter Anvin" <hpa@...or.com>,
jgross@...e.com, "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>, pbonzini@...hat.com,
Peter Gonda <pgonda@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
Tom Lendacky <thomas.lendacky@....com>,
Vitaly Kuznetsov <vkuznets@...hat.com>, x86@...nel.org,
Rick Edgecombe <rick.p.edgecombe@...el.com>, jiewen.yao@...el.com
Subject: Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX
On 8/21/2025 11:30 AM, Binbin Wu wrote:
>
>
> On 8/21/2025 1:56 AM, Sean Christopherson wrote:
>> On Wed, Aug 20, 2025, Binbin Wu wrote:
[...]
>>> Hi Sean,
>>>
>>> Since guest_force_mtrr_state() also supports to force MTRR variable ranges,
>>> I am wondering if we could use guest_force_mtrr_state() to set the legacy PCI
>>> hole range as UC?
>>>
>>> Is it less hacky?
>> Oh! That's a way better idea than my hack. I missed that the kernel would still
>> consult MTRRs.
>>
>> Compile tested only, but something like this?
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index 8ae750cde0c6..45c8871cdda1 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -933,6 +933,13 @@ static void kvm_sev_hc_page_enc_status(unsigned long pfn, int npages, bool enc)
>> static void __init kvm_init_platform(void)
>> {
>> + u64 tolud = e820__end_of_low_ram_pfn() << PAGE_SHIFT;
>> + struct mtrr_var_range pci_hole = {
>> + .base_lo = tolud | X86_MEMTYPE_UC,
>> + .mask_lo = (u32)(~(SZ_4G - tolud - 1)) | BIT(11),
>> + .mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32,
>> + };
>> +
>
> This value of tolud may not meet the range size and alignment requirement for
> variable MTRR.
>
> Variable MTRR has requirement for range size and alignment:
> For ranges greater than 4 KBytes, each range must be of length 2^n and its base
> address must be aligned on a 2^n boundary, where n is a value equal to or
> greater than 12. The base-address alignment value cannot be less than its length.
Wait, Linux kernel converts MTRR register values to MTRR state (base and size) and
cache it for later lookups (refer to map_add_var()). I.e., in Linux kernel,
only the cached state will be used.
These MTRR register values are never programmed when using
guest_force_mtrr_state() , so even the values doesn't meet the
requirement from hardware perspective, Linux kernel can still get the right base and size.
No bothering to force the base and size alignment.
But a comment would be helpful.
Also, BIT(11) could be replaced by MTRR_PHYSMASK_V.
How about:
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 90097df4eafd..a9582ffc3088 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -934,9 +934,15 @@ static void kvm_sev_hc_page_enc_status(unsigned long pfn, int npages, bool enc)
static void __init kvm_init_platform(void)
{
u64 tolud = e820__end_of_low_ram_pfn() << PAGE_SHIFT;
+ /*
+ * The range's base address and size may not meet the alignment
+ * requirement for variable MTRR. However, Linux guest never
+ * programs MTRRs when forcing guest MTRR state, no bothering to
+ * enforce the base and range size alignment.
+ */
struct mtrr_var_range pci_hole = {
.base_lo = tolud | X86_MEMTYPE_UC,
- .mask_lo = (u32)(~(SZ_4G - tolud - 1)) | BIT(11),
+ .mask_lo = (u32)(~(SZ_4G - tolud - 1)) | MTRR_PHYSMASK_V,
.mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32,
};
I tested it in my setup, it can fix the issue of TPM driver failure with the
modified ACPI table for TPM in QEMU.
Hi Vishal,
Could you test it with google's VMM?
>
> In my setup, the value of tolud is 0x7FF7C000, it requires 3 variable MTRRs to
> meet the requirement, i.e.,
> - 7FF7 C000 ~ 7FF8 0000
> - 7FF8 0000 ~ 8000 0000
> - 8000 0000 ~ 1 0000 0000
>
> I checks the implementation in EDK2, in order to fit the legacy PCI hole into
> one variable MTRR, it has some assumption to truncate the size and round up the
> base address in PlatformQemuUc32BaseInitialization():
> ...
> ASSERT (
> PlatformInfoHob->HostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID ||
> PlatformInfoHob->HostBridgeDevId == INTEL_82441_DEVICE_ID
> );
> ...
> //
> // Start with the [LowerMemorySize, 4GB) range. Make sure one
> // variable MTRR suffices by truncating the size to a whole power of two,
> // while keeping the end affixed to 4GB. This will round the base up.
> //
> PlatformInfoHob->Uc32Size = GetPowerOfTwo32 ((UINT32)(SIZE_4GB - PlatformInfoHob->LowMemory));
> PlatformInfoHob->Uc32Base = (UINT32)(SIZE_4GB - PlatformInfoHob->Uc32Size);
> //
> // Assuming that LowerMemorySize is at least 1 byte, Uc32Size is at most 2GB.
> // Therefore Uc32Base is at least 2GB.
> //
> ASSERT (PlatformInfoHob->Uc32Base >= BASE_2GB);
>
> I am not sure if KVM can do such assumption.
> Otherwise, KVM needs to calculate the ranges to meet the requirement. :(
>
>
>> if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
>> kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) {
>> unsigned long nr_pages;
>> @@ -982,8 +989,12 @@ static void __init kvm_init_platform(void)
>> kvmclock_init();
>> x86_platform.apic_post_init = kvm_apic_init;
>> - /* Set WB as the default cache mode for SEV-SNP and TDX */
>> - guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK);
>> + /*
>> + * Set WB as the default cache mode for SEV-SNP and TDX, with a single
>> + * UC range for the legacy PCI hole, e.g. so that devices that expect
>> + * to get UC/WC mappings don't get surprised with WB.
>> + */
>> + guest_force_mtrr_state(&pci_hole, 1, MTRR_TYPE_WRBACK);
>> }
>> #if defined(CONFIG_AMD_MEM_ENCRYPT)
>
>
Powered by blists - more mailing lists