lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20211108144505.fz3p4fw4q2efj32r@box.shutemov.name>
Date:   Mon, 8 Nov 2021 17:45:05 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Richard Henderson <rth@...ddle.net>,
        Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
        James E J Bottomley <James.Bottomley@...senpartnership.com>,
        Helge Deller <deller@....de>,
        "David S . Miller" <davem@...emloft.net>,
        Arnd Bergmann <arnd@...db.de>,
        Jonathan Corbet <corbet@....net>,
        "Michael S . Tsirkin" <mst@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Peter H Anvin <hpa@...or.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Tony Luck <tony.luck@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Kirill Shutemov <kirill.shutemov@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
        x86@...nel.org, linux-kernel@...r.kernel.org,
        linux-pci@...r.kernel.org, linux-alpha@...r.kernel.org,
        linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
        sparclinux@...r.kernel.org, linux-arch@...r.kernel.org,
        linux-doc@...r.kernel.org,
        virtualization@...ts.linux-foundation.org
Subject: Re: [PATCH v5 03/16] x86/tdx: Exclude Shared bit from physical_mask

On Fri, Nov 05, 2021 at 10:11:48PM +0000, Sean Christopherson wrote:
> On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> > 
> > Just like MKTME, TDX reassigns bits of the physical address for
> > metadata.  MKTME used several bits for an encryption KeyID. TDX
> > uses a single bit in guests to communicate whether a physical page
> > should be protected by TDX as private memory (bit set to 0) or
> > unprotected and shared with the VMM (bit set to 1).
> > 
> > Add a helper, tdx_shared_mask() to generate the mask.  The processor
> > enumerates its physical address width to include the shared bit, which
> > means it gets included in __PHYSICAL_MASK by default.
> 
> This is incorrect.  The shared bit _may_ be a legal PA bit, but AIUI it's not a
> hard requirement.

Good point, will fix.

> > Remove the shared mask from 'physical_mask' since any bits in
> > tdx_shared_mask() are not used for physical addresses in page table
> > entries.
> 
> ...
> 
> > @@ -94,6 +100,9 @@ static void tdx_get_info(void)
> >  
> >  	td_info.gpa_width = out.rcx & GENMASK(5, 0);
> >  	td_info.attributes = out.rdx;
> > +
> > +	/* Exclude Shared bit from the __PHYSICAL_MASK */
> > +	physical_mask &= ~tdx_shared_mask();
> 
> This is insufficient, though it's not really the fault of this patch, the specs
> themselves botch this whole thing.
> 
> The TDX Module spec explicitly states that GPAs above GPAW are considered reserved.
> 
>     10.11.1. GPAW-Relate EPT Violations
>     GPA bits higher than the SHARED bit are considered reserved and must be 0.
>     Address translation with any of the reserved bits set to 1 cause a #PF with
>     PFEC (Page Fault Error Code) RSVD bit set.
> 
> But this is contradicted by the architectural extensions spec, which states that
> a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not #PF.
> Note, this section also appears to have a bug, as it states that GPA bit 47 is
> both the SHARED bit and reserved.  I assume that blurb is intended to clarify
> that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's
> the shared bit it's ok to access.
> 
>     1.4.2
>     Guest Physical Address Translation
>     If the CPU's maximum physical-address width (MAXPA) is 52 and the guest physical
>     address width is configured to be 48, accesses with GPA bits 51:48 not all being
>     0 can cause an EPT-violation, where such EPT-violations are not mutated to #VE,
>     even if the “EPT-violations #VE” execution control is 1.
> 
>     If the CPU's physical-address width (MAXPA) is less than 48 and the SHARED bit
>     is configured to be in bit position 47, GPA bit 47 would be reserved, and GPA
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                    
>     bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits
>     46:MAXPA in any paging structure can cause a reserved bit page fault on access.
> 
> The Module spec also calls out that the effective GPA is not to be confused with
> MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that MAXPA
> is enumerated separately by design so that the guest doesn't incorrectly think
> 46:MAXPA are usable.  But that is problematic for the case where MAXPA > GPAW.
> 
>     The effective GPA width (in bits) for this TD (do not confuse with MAXPA).
>     SHARED bit is at GPA bit GPAW-1.
> 
> I can't find the exact reference, but the TDX module always passes through host's
> MAXPHYADDR.  As it pertains to this patch, just doing
> 
> 	physical_mask &= ~tdx_shared_mask()
> 
> means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a discontiguous
> physical_mask, and could access "reserved" memory.  If the VMM defines legal memory
> with bits [MAXPHYADDR:48]!=0, explosions may ensue.  That's arguably a VMM bug, but
> given that the VMM is untrusted I think the guest should be paranoid when handling
> the SHARED bit.  I also don't know that the kernel will play nice with a discontiguous
> mask.

I expect it to be buggy.

> Specs aside, unless Intel makes a hardware change to treat GPAW as guest.MAXPHYADDR,
> or the TDX Module emulates on EPT violations to inject #PF(RSVD) when appropriate,
> this mess isn't going to be truly fixed from the guest perspective.
> 
> So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or
> refuse to boot if the host has defined legal memory in that range.

Right. But only >= GPAW-1 as shared bit is the MSB within GPAW:

	physical_mask &= GENMASK_ULL(td_info.gpa_width - 2, 0);

'2' here smells bad, but well...

Given that physical_mask is now contiguous we can truncate anything from
e820 that cannot be addressed with adjusted __PHYSICAL_MASK:

iff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index bc0657f0deed..16d57a8769e8 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -833,6 +833,9 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type
 	unsigned long last_pfn = 0;
 	unsigned long max_arch_pfn = MAX_ARCH_PFN;

+	if (max_arch_pfn > PHYS_PFN(__PHYSICAL_MASK + 1))
+		max_arch_pfn = PHYS_PFN(__PHYSICAL_MASK + 1);
+
 	for (i = 0; i < e820_table->nr_entries; i++) {
 		struct e820_entry *entry = &e820_table->entries[i];
 		unsigned long start_pfn;

Does it look reasonable?

> FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to force
> GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49.  But on the guest side,
> I think we should be paranoid.

-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ