linux-kernel - Re: [PATCH v12 22/46] x86/sev: Use SEV-SNP AP creation to start secondary CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YkybTnVYBKZ1zvz6@google.com>
Date:   Tue, 5 Apr 2022 19:41:02 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Brijesh Singh <brijesh.singh@....com>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-efi@...r.kernel.org, platform-driver-x86@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <thomas.lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Jim Mattson <jmattson@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sergio Lopez <slp@...hat.com>, Peter Gonda <pgonda@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Dov Murik <dovmurik@...ux.ibm.com>,
        Tobin Feldman-Fitzthum <tobin@....com>,
        Borislav Petkov <bp@...en8.de>,
        Michael Roth <michael.roth@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        brijesh.ksingh@...il.com, tony.luck@...el.com, marcorr@...gle.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com
Subject: Re: [PATCH v12 22/46] x86/sev: Use SEV-SNP AP creation to start
 secondary CPUs

On Tue, Apr 05, 2022, Brijesh Singh wrote:
> Hi Sean,
>
> On 4/4/22 19:24, Sean Christopherson wrote:
>
> > > +static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa)
> > > +{
> > > + int err;
> > > +
> > > + err = snp_set_vmsa(vmsa, false);
> >
> > Uh, so what happens if a malicious guest does RMPADJUST to convert a VMSA page
> > back to a "normal" page while the host is trying to VMRUN that VMSA?  Does VMRUN
> > fault?
>
> When SEV-SNP is enabled, the VMRUN instruction performs an additional
> security checks on various memory pages. In the case of VMSA page, hardware
> enforce that page is marked as "VMSA" in the RMP table. If not,  VMRUN will
> fail with VMEXIT_INVALID.
>
> After the VMRUN is successful, the VMSA page is marked IN_USE by the
> hardware, any attempt to modify the RMP entries will result in FAIL_INUSE
> error. The IN_USE marking is automatically cleared by the hardware after the
> #VMEXIT.
>
> Please see the APM vol2 section 15.36.12 for additional information.

Thanks!

> > Can Linux refuse to support this madness and instead require the ACPI MP wakeup
> > protocol being proposed/implemented for TDX?  That would allow KVM to have at
>
> My two cents
>
> In the current architecture, the HV track VMSAs by their SPA and guest
> controls when they are runnable. It provides flexibility to the guest, which
> can add and remove the VMSA. This flexibility may come in handy to support
> the kexec and reboot use cases.

I understand it provides the guest flexibility, but IMO it completely inverts the
separation of concerns between host and guest.  The host should have control of
when a vCPU is added/removed and with what state, and the guest should be able to
verify/acknowledge any changes.  This scheme gives the guest the bulk of the control,
and doesn't even let the host verify much at all since the VMSA is opaque.

That the guest can yank the rug out from the host at any time just adds to the pain.
VMEXIT_INVALID isn't the end of the world, but it breaks the assumption that such
errors are host bugs.  To guard against such behavior, the host would have to unmap
the VMSA page in order to prevent unwanted RMPADJUST, and that gets ugly fast if a
VMSA can be any arbitrary guest page.

Another example is the 2mb alignment erratum.  Technically, the guest can't workaround
the erratum with 100% certainty because there's no guarantee that the host uses the
same alignment for gfns and pfns.  I don't actually expect a host to use unaligned
mappings, just pointing out how backwards this is.

I fully realize there's basically zero chance of getting any of this changed in
hardware/firmware, but I'm hoping we can concoct a software/GHCB solution to the
worst issues.

I don't see an way easy to address the guest getting to shove state directly into
the VMSA, but the location of the VMSA gfn/pfn is a very solvable problem.  E.g.
the host gets full control over each vCPU's VMSA, and the host-provided VMSA is
discoverable in the guest.  That allows the guest to change vCPU state, e.g. for AP
bringup, kexec, etc..., but gives the host the ability to protect itself without
having to support arbitrary VMSA pages.  E.g. the host can dynamically map/unmap the
VMSA from the guest: map on fault, unmap on AP "creation", refuse to run the vCPU if
its VMSA isn't in the unmap state.  The VMSA pfn is fully host controlled, so
there's no need for the guest to be aware of the 2mb alignment erratum.

Requiring such GHCB extensions in the guest would make Linux incompatible with
hypervisors that aren't updated, but IMO that's not a ridiculous ask given that
it would be in the best interested of any hypervisor that isn't running a fully
trusted, paravirt VMPL0.

> The current approach does not depend on
> ACPI; it will also come in handy to support microvm (minimalist machine type
> without PCI nor ACPI support).

Eh, a microvm really shouldn't need AP bringup in the first place, just run all
APs from time zero and route them to where they need to be.