linux-kernel - Re: [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <B17112AE-8848-48B0-997D-E1A3D79BD395@amacapital.net>
Date:   Mon, 19 Apr 2021 11:10:45 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Brijesh Singh <brijesh.singh@....com>,
        Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org,
        x86@...nel.org, kvm@...r.kernel.org, linux-crypto@...r.kernel.org,
        ak@...ux.intel.com, herbert@...dor.apana.org.au,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        "H. Peter Anvin" <hpa@...or.com>, Tony Luck <tony.luck@...el.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        David Rientjes <rientjes@...gle.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table



> On Apr 19, 2021, at 10:58 AM, Dave Hansen <dave.hansen@...el.com> wrote:
> 
> On 4/19/21 10:46 AM, Brijesh Singh wrote:
>> - guest wants to make gpa 0x1000 as a shared page. To support this, we
>> need to psmash the large RMP entry into 512 4K entries. The psmash
>> instruction breaks the large RMP entry into 512 4K entries without
>> affecting the previous validation. Now the we need to force the host to
>> use the 4K page level instead of the 2MB.
>> 
>> To my understanding, Linux kernel fault handler does not build the page
>> tables on demand for the kernel addresses. All kernel addresses are
>> pre-mapped on the boot. Currently, I am proactively spitting the physmap
>> to avoid running into situation where x86 page level is greater than the
>> RMP page level.
> 
> In other words, if the host maps guest memory with 2M mappings, the
> guest can induce page faults in the host.  The only way the host can
> avoid this is to map everything with 4k mappings.
> 
> If the host does not avoid this, it could end up in the situation where
> it gets page faults on access to kernel data structures.  Imagine if a
> kernel stack page ended up in the same 2M mapping as a guest page.  I
> *think* the next write to the kernel stack would end up double-faulting.

I’m confused by this scenario. This should only affect physical pages that are in the 2M area that contains guest memory. But, if we have a 2M direct map PMD entry that contains kernel data and guest private memory, we’re already in a situation in which the kernel touching that memory would machine check, right?

ISTM we should fully unmap any guest private page from the kernel and all host user pagetables before actually making it be a guest private page.