linux-kernel - Re: [PATCH] KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8f500f3d-d3a8-c873-50b0-d3cc72ddb372@amd.com>
Date:   Thu, 20 May 2021 16:04:16 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Peter Gonda <pgonda@...gle.com>, kvm list <kvm@...r.kernel.org>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Brijesh Singh <brijesh.singh@....com>
Subject: Re: [PATCH] KVM: SVM: Do not terminate SEV-ES guests on GHCB
 validation failure

On 5/20/21 3:22 PM, Sean Christopherson wrote:
> On Thu, May 20, 2021, Sean Christopherson wrote:
>> On Thu, May 20, 2021, Sean Christopherson wrote:
>>> On Mon, May 17, 2021, Tom Lendacky wrote:
>>>> On 5/14/21 6:06 PM, Peter Gonda wrote:
>>>>> On Fri, May 14, 2021 at 1:22 PM Tom Lendacky <thomas.lendacky@....com> wrote:
>>>>>>
>>>>>> Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
>>>>>> exit code and parameters fail. Since the VMGEXIT instruction can be issued
>>>>>> from userspace, even though userspace (likely) can't update the GHCB,
>>>>>> don't allow userspace to be able to kill the guest.
>>>>>>
>>>>>> Return a #GP request through the GHCB when validation fails, rather than
>>>>>> terminating the guest.
>>>>>
>>>>> Is this a gap in the spec? I don't see anything that details what
>>>>> should happen if the correct fields for NAE are not set in the first
>>>>> couple paragraphs of section 4 'GHCB Protocol'.
>>>>
>>>> No, I don't think the spec needs to spell out everything like this. The
>>>> hypervisor is free to determine its course of action in this case.
>>>
>>> The hypervisor can decide whether to inject/return an error or kill the guest,
>>> but what errors can be returned and how they're returned absolutely needs to be
>>> ABI between guest and host, and to make the ABI vendor agnostic the GHCB spec
>>> is the logical place to define said ABI.
>>>
>>> For example, "injecting" #GP if the guest botched the GHCB on #VMGEXIT(CPUID) is
>>> completely nonsensical.  As is, a Linux guest appears to blindly forward the #GP,
>>> which means if something does go awry KVM has just made debugging the guest that
>>> much harder, e.g. imagine the confusion that will ensue if the end result is a
>>> SIGBUS to userspace on CPUID.
>>>
>>> There needs to be an explicit error code for "you gave me bad data", otherwise
>>> we're signing ourselves up for future pain.
>>
>> More concretely, I think the best course of action is to define a new return code
>> in SW_EXITINFO1[31:0], e.g. '2', with additional information in SW_EXITINFO2.
>>
>> In theory, an old-but-sane guest will interpret the unexpected return code as
>> fatal to whatever triggered the #VMGEXIT, e.g. SIGBUS to userspace.  Unfortunately
>> Linux isn't sane because sev_es_ghcb_hv_call() assumes any non-'1' result means
>> success, but that's trivial to fix and IMO should be fixed irrespective of where
>> this goes.
> 
> One last thing (hopefully): Erdem pointed out that if the GCHB GPA (or any
> derferenced pointers within the GHCB) is invalid or is set to a private GPA
> (mostly in the context of SNP) then the VMM will likely have no choice but to
> kill the guest in response to #VMGEXIT.
> 
> It's probably a good idea to add a blurb in one of the specs explicitly calling
> out that #VMGEXIT can be executed from userspace, and that before returning to
> uesrspace the guest kernel must always ensure that the GCHB points at a legal
> GPA _and_ all primary fields are marked invalid. 

Yes, the spec can be updated to include a "best practices" section for
OSes and Hypervisors to follow without actually having to update the
version of the GHCB spec, so that should be doable.

Thanks,
Tom

>