lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aIeX0GQh1Q_4N597@google.com>
Date: Mon, 28 Jul 2025 08:33:26 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Binbin Wu <binbin.wu@...ux.intel.com>
Cc: Nikolay Borisov <nik.borisov@...e.com>, Jianxiong Gao <jxgao@...gle.com>, 
	"Borislav Petkov (AMD)" <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	Dionna Glaze <dionnaglaze@...gle.com>, "H. Peter Anvin" <hpa@...or.com>, jgross@...e.com, 
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, pbonzini@...hat.com, 
	Peter Gonda <pgonda@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Tom Lendacky <thomas.lendacky@....com>, Vitaly Kuznetsov <vkuznets@...hat.com>, x86@...nel.org, 
	Rick Edgecombe <rick.p.edgecombe@...el.com>, jiewen.yao@...el.com
Subject: Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX

+Jiewen

Summary, with the questions at the end.

Recent upstream kernels running in GCE SNP/TDX VMs fail to probe the TPM due to
the TPM driver's ioremap (with UC) failing because the kernel has already mapped
the range using a cachaeable mapping (WB).

 ioremap error for 0xfed40000-0xfed45000, requested 0x2, got 0x0
 tpm_tis MSFT0101:00: probe with driver tpm_tis failed with error -12

The "guilty" commit is 8e690b817e38 ("x86/kvm: Override default caching mode for
SEV-SNP and TDX"), which as the subject suggests, forces the kernel's MTRR memtype
to WB.  With SNP and TDX, the virtual MTRR state is (a) controlled by the VMM and
thus is untrusted, and (b) _should_ be irrelevant because no known hypervisor
actually honors the memtypes programmed into the virtual MTRRs.

It turns out that the kernel has been relying on the MTRRs to force the TPM TIS
region (and potentially other regions) to be UC, so that the kernel ACPI driver's
attempts to map of SystemMemory entries as cacheable get forced to UC.  With MTRRs
forced WB, x86_acpi_os_ioremap() succeeds in creating a WB mapping, which in turn
causes the ioremap infrastructure to reject the TPM driver's UC mapping.

IIUC, the TPM entry(s) in the ACPI tables for GCE VMs are derived (built?) from
EDK2's TPM ASL.  And (again, IIUC), this code in SecurityPkg/Tcg/Tcg2Acpi/Tpm.asl[1]

      //
      // Operational region for TPM access
      //
      OperationRegion (TPMR, SystemMemory, 0xfed40000, 0x5000)

generates the problematic SystemMemory entry that triggers the ACPI driver's
auto-mapping logic.

QEMU-based VMs don't suffer the same fate, as QEMU intentionally[2] doesn't use
EDK2's AML for the TPM, and QEMU doesn't define a SystemMemory entry, just a
Memory32Fixed entry.

Presumably this an EDK2 bug?  If it's not an EDK2 bug, then how is the kernel's
ACPI driver supposed to know that some ranges of SystemMemory must be mapped UC?

[1] https://github.com/tianocore/edk2/blob/master/SecurityPkg/Tcg/Tcg2Acpi/Tpm.asl#L53
[2] https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03397.html

On Thu, Jul 24, 2025, Binbin Wu wrote:
> On 7/23/2025 10:34 PM, Sean Christopherson wrote:
> > On Mon, Jul 14, 2025, Nikolay Borisov wrote:
> > > On 14.07.25 г. 12:06 ч., Binbin Wu wrote:
> > > > On 7/10/2025 12:54 AM, Jianxiong Gao wrote:
> > > > > I tested this patch on top of commit 8e690b817e38, however we are
> > > > > still experiencing the same failure.
> > > > > 
> > > > I didn't reproduce the issue with QEMU.
> > > > After some comparison on how QEMU building the ACPI tables for HPET and
> > > > TPM,
> > > > 
> > > > - For HPET, the HPET range is added as Operation Region:
> > > >       aml_append(dev,
> > > >           aml_operation_region("HPTM", AML_SYSTEM_MEMORY,
> > > > aml_int(HPET_BASE),
> > > >                                HPET_LEN));
> > > > 
> > > > - For TPM, the range is added as 32-Bit Fixed Memory Range:
> > > >       if (TPM_IS_TIS_ISA(tpm_find())) {
> > > >           aml_append(crs, aml_memory32_fixed(TPM_TIS_ADDR_BASE,
> > > >                      TPM_TIS_ADDR_SIZE, AML_READ_WRITE));
> > > >       }
> > > > 
> > > > So, in KVM, the code patch of TPM is different from the trace for HPET in
> > > > the patch https://lore.kernel.org/kvm/20250201005048.657470-3-seanjc@google.com/,
> > > > HPET will trigger the code path acpi_os_map_iomem(), but TPM doesn't.
> > Argh, I was looking at the wrong TPM resource when poking through QEMU.  I peeked
> > at TPM_PPI_ADDR_BASE, which gets an AML_SYSTEM_MEMORY entry, not TPM_TIS_ADDR_BASE.

...

> I guess google has defined a ACPI method to access the region for TPM TIS during
> ACPI device probe.
> 
> > 
> > In the meantime, can someone who has reproduced the real issue get backtraces to
> > confirm or disprove that acpi_os_map_iomem() is trying to map the TPM TIS range
> > as WB?  E.g. with something like so:

Got confirmation off-list that Google's ACPI tables due trigger the kernel's
cachable mapping logic for SYSTEM_MEMORY.

 Mapping TPM TIS with req_type = 0
 WARNING: CPU: 22 PID: 1 at arch/x86/mm/pat/memtype.c:530 memtype_reserve+0x2ab/0x460
 Modules linked in:
 CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W           6.16.0-rc7+ #2 VOLUNTARY 
 Tainted: [W]=WARN
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/29/2025
 RIP: 0010:memtype_reserve+0x2ab/0x460
  __ioremap_caller+0x16d/0x3d0
  ioremap_cache+0x17/0x30
  x86_acpi_os_ioremap+0xe/0x20
  acpi_os_map_iomem+0x1f3/0x240
  acpi_os_map_memory+0xe/0x20
  acpi_ex_system_memory_space_handler+0x273/0x440
  acpi_ev_address_space_dispatch+0x176/0x4c0
  acpi_ex_access_region+0x2ad/0x530
  acpi_ex_field_datum_io+0xa2/0x4f0
  acpi_ex_extract_from_field+0x296/0x3e0
  acpi_ex_read_data_from_field+0xd1/0x460
  acpi_ex_resolve_node_to_value+0x2ee/0x530
  acpi_ex_resolve_to_value+0x1f2/0x540
  acpi_ds_evaluate_name_path+0x11b/0x190
  acpi_ds_exec_end_op+0x456/0x960
  acpi_ps_parse_loop+0x27a/0xa50
  acpi_ps_parse_aml+0x226/0x600
  acpi_ps_execute_method+0x172/0x3e0
  acpi_ns_evaluate+0x175/0x5f0
  acpi_evaluate_object+0x213/0x490
  acpi_evaluate_integer+0x6d/0x140
  acpi_bus_get_status+0x93/0x150
  acpi_add_single_object+0x43a/0x7c0
  acpi_bus_check_add+0x149/0x3a0
  acpi_bus_check_add_1+0x16/0x30
  acpi_ns_walk_namespace+0x22c/0x360
  acpi_walk_namespace+0x15c/0x170
  acpi_bus_scan+0x1dd/0x200
  acpi_scan_init+0xe5/0x2b0
  acpi_init+0x264/0x5b0
  do_one_initcall+0x5a/0x310
  kernel_init_freeable+0x34f/0x4f0
  kernel_init+0x1b/0x200
  ret_from_fork+0x186/0x1b0
  ret_from_fork_asm+0x1a/0x30
  </TASK>

> I tried to add an AML_SYSTEM_MEMORY entry as operation region in the ACPI
> table and modify the _STA method to access the region for TPM TIS in QEMU, then
> the issue can be reproduced.
> 
> diff --git a/hw/tpm/tpm_tis_isa.c b/hw/tpm/tpm_tis_isa.c
> index 876cb02cb5..aca2b2993f 100644
> --- a/hw/tpm/tpm_tis_isa.c
> +++ b/hw/tpm/tpm_tis_isa.c
> @@ -143,6 +143,9 @@ static void build_tpm_tis_isa_aml(AcpiDevAmlIf *adev, Aml *scope)
>      Aml *dev, *crs;
>      TPMStateISA *isadev = TPM_TIS_ISA(adev);
>      TPMIf *ti = TPM_IF(isadev);
> +    Aml *field;
> +    Aml *method;
> +    Aml *test = aml_local(0);
> 
>      dev = aml_device("TPM");
>      if (tpm_tis_isa_get_tpm_version(ti) == TPM_VERSION_2_0) {
> @@ -152,7 +155,19 @@ static void build_tpm_tis_isa_aml(AcpiDevAmlIf *adev, Aml *scope)
>          aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0C31")));
>      }
>      aml_append(dev, aml_name_decl("_UID", aml_int(1)));
> -    aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
> +
> +    aml_append(dev, aml_operation_region("TPMM", AML_SYSTEM_MEMORY, aml_int(TPM_TIS_ADDR_BASE),
> +                         TPM_TIS_ADDR_SIZE));
> +
> +    field = aml_field("TPMM", AML_DWORD_ACC, AML_LOCK, AML_PRESERVE);
> +    aml_append(field, aml_named_field("TEST", 32));
> +    aml_append(dev, field);
> +
> +    method = aml_method("_STA", 0, AML_NOTSERIALIZED);
> +    aml_append(method, aml_store(aml_name("TEST"), test));
> +    aml_append(method, aml_return(aml_int(0xF)));
> +    aml_append(dev, method);



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ