[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20211108144505.fz3p4fw4q2efj32r@box.shutemov.name>
Date: Mon, 8 Nov 2021 17:45:05 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Richard Henderson <rth@...ddle.net>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
James E J Bottomley <James.Bottomley@...senpartnership.com>,
Helge Deller <deller@....de>,
"David S . Miller" <davem@...emloft.net>,
Arnd Bergmann <arnd@...db.de>,
Jonathan Corbet <corbet@....net>,
"Michael S . Tsirkin" <mst@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>,
David Hildenbrand <david@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Peter H Anvin <hpa@...or.com>,
Dave Hansen <dave.hansen@...el.com>,
Tony Luck <tony.luck@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Andi Kleen <ak@...ux.intel.com>,
Kirill Shutemov <kirill.shutemov@...ux.intel.com>,
Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
x86@...nel.org, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, linux-alpha@...r.kernel.org,
linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
sparclinux@...r.kernel.org, linux-arch@...r.kernel.org,
linux-doc@...r.kernel.org,
virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH v5 03/16] x86/tdx: Exclude Shared bit from physical_mask
On Fri, Nov 05, 2021 at 10:11:48PM +0000, Sean Christopherson wrote:
> On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> >
> > Just like MKTME, TDX reassigns bits of the physical address for
> > metadata. MKTME used several bits for an encryption KeyID. TDX
> > uses a single bit in guests to communicate whether a physical page
> > should be protected by TDX as private memory (bit set to 0) or
> > unprotected and shared with the VMM (bit set to 1).
> >
> > Add a helper, tdx_shared_mask() to generate the mask. The processor
> > enumerates its physical address width to include the shared bit, which
> > means it gets included in __PHYSICAL_MASK by default.
>
> This is incorrect. The shared bit _may_ be a legal PA bit, but AIUI it's not a
> hard requirement.
Good point, will fix.
> > Remove the shared mask from 'physical_mask' since any bits in
> > tdx_shared_mask() are not used for physical addresses in page table
> > entries.
>
> ...
>
> > @@ -94,6 +100,9 @@ static void tdx_get_info(void)
> >
> > td_info.gpa_width = out.rcx & GENMASK(5, 0);
> > td_info.attributes = out.rdx;
> > +
> > + /* Exclude Shared bit from the __PHYSICAL_MASK */
> > + physical_mask &= ~tdx_shared_mask();
>
> This is insufficient, though it's not really the fault of this patch, the specs
> themselves botch this whole thing.
>
> The TDX Module spec explicitly states that GPAs above GPAW are considered reserved.
>
> 10.11.1. GPAW-Relate EPT Violations
> GPA bits higher than the SHARED bit are considered reserved and must be 0.
> Address translation with any of the reserved bits set to 1 cause a #PF with
> PFEC (Page Fault Error Code) RSVD bit set.
>
> But this is contradicted by the architectural extensions spec, which states that
> a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not #PF.
> Note, this section also appears to have a bug, as it states that GPA bit 47 is
> both the SHARED bit and reserved. I assume that blurb is intended to clarify
> that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's
> the shared bit it's ok to access.
>
> 1.4.2
> Guest Physical Address Translation
> If the CPU's maximum physical-address width (MAXPA) is 52 and the guest physical
> address width is configured to be 48, accesses with GPA bits 51:48 not all being
> 0 can cause an EPT-violation, where such EPT-violations are not mutated to #VE,
> even if the “EPT-violations #VE” execution control is 1.
>
> If the CPU's physical-address width (MAXPA) is less than 48 and the SHARED bit
> is configured to be in bit position 47, GPA bit 47 would be reserved, and GPA
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits
> 46:MAXPA in any paging structure can cause a reserved bit page fault on access.
>
> The Module spec also calls out that the effective GPA is not to be confused with
> MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that MAXPA
> is enumerated separately by design so that the guest doesn't incorrectly think
> 46:MAXPA are usable. But that is problematic for the case where MAXPA > GPAW.
>
> The effective GPA width (in bits) for this TD (do not confuse with MAXPA).
> SHARED bit is at GPA bit GPAW-1.
>
> I can't find the exact reference, but the TDX module always passes through host's
> MAXPHYADDR. As it pertains to this patch, just doing
>
> physical_mask &= ~tdx_shared_mask()
>
> means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a discontiguous
> physical_mask, and could access "reserved" memory. If the VMM defines legal memory
> with bits [MAXPHYADDR:48]!=0, explosions may ensue. That's arguably a VMM bug, but
> given that the VMM is untrusted I think the guest should be paranoid when handling
> the SHARED bit. I also don't know that the kernel will play nice with a discontiguous
> mask.
I expect it to be buggy.
> Specs aside, unless Intel makes a hardware change to treat GPAW as guest.MAXPHYADDR,
> or the TDX Module emulates on EPT violations to inject #PF(RSVD) when appropriate,
> this mess isn't going to be truly fixed from the guest perspective.
>
> So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or
> refuse to boot if the host has defined legal memory in that range.
Right. But only >= GPAW-1 as shared bit is the MSB within GPAW:
physical_mask &= GENMASK_ULL(td_info.gpa_width - 2, 0);
'2' here smells bad, but well...
Given that physical_mask is now contiguous we can truncate anything from
e820 that cannot be addressed with adjusted __PHYSICAL_MASK:
iff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index bc0657f0deed..16d57a8769e8 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -833,6 +833,9 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
unsigned long last_pfn = 0;
unsigned long max_arch_pfn = MAX_ARCH_PFN;
+ if (max_arch_pfn > PHYS_PFN(__PHYSICAL_MASK + 1))
+ max_arch_pfn = PHYS_PFN(__PHYSICAL_MASK + 1);
+
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
unsigned long start_pfn;
Does it look reasonable?
> FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to force
> GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49. But on the guest side,
> I think we should be paranoid.
--
Kirill A. Shutemov
Powered by blists - more mailing lists