linux-kernel - Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YZJTA1NyLCmVtGtY@work-vm>
Date:   Mon, 15 Nov 2021 12:30:59 +0000
From:   "Dr. David Alan Gilbert" <dgilbert@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...el.com>,
        Peter Gonda <pgonda@...gle.com>,
        Brijesh Singh <brijesh.singh@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <Thomas.Lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sergio Lopez <slp@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Dov Murik <dovmurik@...ux.ibm.com>,
        Tobin Feldman-Fitzthum <tobin@....com>,
        Michael Roth <michael.roth@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Andi Kleen <ak@...ux.intel.com>, tony.luck@...el.com,
        marcorr@...gle.com, sathyanarayanan.kuppuswamy@...ux.intel.com
Subject: Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP)
 Hypervisor Support

* Sean Christopherson (seanjc@...gle.com) wrote:
> On Fri, Nov 12, 2021, Borislav Petkov wrote:
> > On Fri, Nov 12, 2021 at 09:59:46AM -0800, Dave Hansen wrote:
> > > Or, is there some mechanism that prevent guest-private memory from being
> > > accessed in random host kernel code?
> 
> Or random host userspace code...
> 
> > So I'm currently under the impression that random host->guest accesses
> > should not happen if not previously agreed upon by both.
> 
> Key word "should".
> 
> > Because, as explained on IRC, if host touches a private guest page,
> > whatever the host does to that page, the next time the guest runs, it'll
> > get a #VC where it will see that that page doesn't belong to it anymore
> > and then, out of paranoia, it will simply terminate to protect itself.
> > 
> > So cloud providers should have an interest to prevent such random stray
> > accesses if they wanna have guests. :)
> 
> Yes, but IMO inducing a fault in the guest because of _host_ bug is wrong.

Would it necessarily have been a host bug?  A guest telling the host a
bad GPA to DMA into would trigger this wouldn't it?

Still; I wonder if it's best to kill the guest - maybe it's best for
the host to kill the guest and leave behind diagnostics of what
happened; for someone debugging the crash, it's going to be less useful
to know that page X was wrongly accessed (which is what the guest would
see), and more useful to know that it was the kernel's vhost-... driver
that accessed it.

Dave

> On Fri, Nov 12, 2021, Peter Gonda wrote:
> > Here is an alternative to the current approach: On RMP violation (host
> > or userspace) the page fault handler converts the page from private to
> > shared to allow the write to continue. This pulls from s390’s error
> > handling which does exactly this. See ‘arch_make_page_accessible()’.
> 
> Ah, after further reading, s390 does _not_ do implicit private=>shared conversions.
> 
> s390's arch_make_page_accessible() is somewhat similar, but it is not a direct
> comparison.  IIUC, it exports and integrity protects the data and thus preserves
> the guest's data in an encrypted form, e.g. so that it can be swapped to disk.
> And if the host corrupts the data, attempting to convert it back to secure on a
> subsequent guest access will fail.
> 
> The host kernel's handling of the "convert to secure" failures doesn't appear to
> be all that robust, e.g. it looks like there are multiple paths where the error
> is dropped on the floor and the guest is resumed , but IMO soft hanging the guest 
> is still better than inducing a fault in the guest, and far better than potentially
> coercing the guest into reading corrupted memory ("spurious" PVALIDATE).  And s390's
> behavior is fixable since it's purely a host error handling problem.
> 
> To truly make a page shared, s390 requires the guest to call into the ultravisor
> to make a page shared.  And on the host side, the host can pin a page as shared
> to prevent the guest from unsharing it while the host is accessing it as a shared
> page.
> 
> So, inducing #VC is similar in the sense that a malicious s390 can also DoS itself,
> but is quite different in that (AFAICT) s390 does not create an attack surface where
> a malicious or buggy host userspace can induce faults in the guest, or worst case in
> SNP, exploit a buggy guest into accepting and accessing corrupted data.
> 
> It's also different in that s390 doesn't implicitly convert between shared and
> private.  Functionally, it doesn't really change the end result because a buggy
> host that writes guest private memory will DoS the guest (by inducing a #VC or
> corrupting exported data), but at least for s390 there's a sane, legitimate use
> case for accessing guest private memory (swap and maybe migration?), whereas for
> SNP, IMO implicitly converting to shared on a host access is straight up wrong.
> 
> > Additionally it adds less complexity to the SNP kernel patches, and
> > requires no new ABI.
> 
> I disagree, this would require "new" ABI in the sense that it commits KVM to
> supporting SNP without requiring userspace to initiate any and all conversions
> between shared and private.  Which in my mind is the big elephant in the room:
> do we want to require new KVM (and kernel?) ABI to allow/force userspace to
> explicitly declare guest private memory for TDX _and_ SNP, or just TDX?
> 
-- 
Dr. David Alan Gilbert / dgilbert@...hat.com / Manchester, UK