linux-kernel - Re: [PATCH Part2 RFC v4 38/40] KVM: SVM: Provide support for SNP_GUEST

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0641fdec-48a0-b3b7-9926-3ce5a6e53eb0@amd.com>
Date:   Tue, 20 Jul 2021 13:21:00 -0500
From:   Brijesh Singh <brijesh.singh@....com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-efi@...r.kernel.org, platform-driver-x86@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <thomas.lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sergio Lopez <slp@...hat.com>, Peter Gonda <pgonda@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Dov Murik <dovmurik@...ux.ibm.com>,
        Tobin Feldman-Fitzthum <tobin@....com>,
        Borislav Petkov <bp@...en8.de>,
        Michael Roth <michael.roth@....com>,
        Vlastimil Babka <vbabka@...e.cz>, tony.luck@...el.com,
        npmccallum@...hat.com, brijesh.ksingh@...il.com
Subject: Re: [PATCH Part2 RFC v4 38/40] KVM: SVM: Provide support for
 SNP_GUEST_REQUEST NAE event



On 7/20/21 11:28 AM, Sean Christopherson wrote:

> 
> Ah, I got confused by this code in snp_build_guest_buf():
> 
> 	data->req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> 
> I was thinking that setting the C-bit meant the memory was guest private, but
> that's setting the C-bit for the HPA, which is correct since KVM installs guest
> memory with C-bit=1 in the NPT, i.e. encrypts shared memory with the host key.
> 
> Tangetially related question, is it correct to say that the host can _read_ memory
> from a page that is assigned=1, but has asid=0?  I.e. KVM can read the response
> page in order to copy it into the guest, even though it is a firmware page?
> 

Yes. The firmware page means that x86 cannot write to it; the read is 
still allowed.


> 	/* Copy the response after the firmware returns success. */
> 	rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
> 
>> In the current series we don't support migration etc so I decided to
>> ratelimit unconditionally.
> 
> Since KVM can peek at the request header, KVM should flat out disallow requests
> that KVM doesn't explicitly support.  E.g. migration requests should not be sent
> to the PSP.
> 

That is acceptable.


> One concern though: How does the guest query what requests are supported?  This
> snippet implies there's some form of enumeration:
> 
>    Note: This guest message may be removed in future versions as it is redundant
>    with the CPUID page in SNP_LAUNCH_UPDATE (see Section 8.14).
> 
> But all I can find is a "Message Version" in "Table 94. Message Type Encodings",
> which implies that request support is all or nothing for a given version.  That
> would be rather unfortunate as KVM has no way to tell the guest that something
> is unsupported :-(
> 

The firmware supports all the commands listed in the spec. The HV 
support is always going to be behind what a firmware or hardware is 
capable of doing. As per the spec is concerned, it say

   The firmware checks that MSG_TYPE is a valid message type. The
   firmware then checks that MSG_SIZE is large enough to hold the
   indicated message type at the indicated message version. If
   not, the firmware returns INVALID_PARAM.

So, a hypervisor could potentially send the INVALID_PARAMS to indicate 
that guest that a message type is not supported.


>>> Is this exposed to userspace in any way?  This feels very much like a knob that
>>> needs to be configurable per-VM.
>>
>> It's not exposed to the userspace and I am not sure if userspace care about
>> this knob.
> 
> Userspace definitely cares, otherwise the system would need to be rebooted just to
> tune the ratelimiting.  And userspace may want to disable ratelimiting entirely,
> e.g. if the entire system is dedicated to a single VM.

Ok.

> 
>>> Also, what are the estimated latencies of a guest request?  If the worst case
>>> latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole
>>> lot.
>>>
>>
>> The latency will depend on what else is going in the system at the time the
>> request comes to the hypervisor. Access to the PSP is serialized so other
>> parallel PSP command execution will contribute to the latency.
> 
> I get that it will be variable, but what are some ballpark latencies?  E.g. what's
> the latency of the slowest command without PSP contention?
> 

In my single VM, I am seeing latency of guest request to be around ~35ms.

>>> Question on the VMPCK sequences.  The firmware ABI says:
>>>
>>>      Each guest has four VMPCKs ... Each message contains a sequence number per
>>>      VMPCK. The sequence number is incremented with each message sent. Messages
>>>      sent by the guest to the firmware and by the firmware to the guest must be
>>>      delivered in order. If not, the firmware will reject subsequent messages ...
>>>
>>> Does that mean there are four independent sequences, i.e. four streams the guest
>>> can use "concurrently", or does it mean the overall freshess/integrity check is
>>> composed from four VMPCK sequences, all of which must be correct for the message
>>> to be valid?
>>>
>>
>> There are four independent sequence counter and in theory guest can use them
>> concurrently. But the access to the PSP must be serialized.
> 
> Technically that's not required from the guest's perspective, correct?  

Correct.

The guest
> only cares about the sequence numbers for a given VMPCK, e.g. it can have one
> in-flight request per VMPCK and expect that to work, even without fully serializing
> its own requests.
> 
> Out of curiosity, why 4 VMPCKs?  It seems completely arbitrary.
> 

I believe the thought process was by providing 4 keys it can provide 
flexibility for each VMPL levels to use a different keys (if they wish). 
The firmware does not care about the vmpl level during the guest request 
handling, it just want to know which key is used for encrypting the 
payload so that he can decrypt and provide the  response for it.


>> Currently, the guest driver uses the VMPCK0 key to communicate with the PSP.
>>
>>
>>> If it's the latter, then a traditional mutex isn't really necessary because the
>>> guest must implement its own serialization, e.g. it's own mutex or whatever, to
>>> ensure there is at most one request in-flight at any given time.
>>
>> The guest driver uses the its own serialization to ensure that there is
>> *exactly* one request in-flight.
> 
> But KVM can't rely on that because it doesn't control the guest, e.g. it may be
> running a non-Linux guest.
>

Yes, KVM should not rely on it. I mentioned that mainly because you said 
that guest must implement its own serialization. In the case of KVM, the 
CCP driver ensure that the command sent to the PSP is serialized.


>> The mutex used here is to protect the KVM's internal firmware response
>> buffer.
> 
> Ya, where I was going with my question was that if the guest was architecturally
> restricted to a single in-flight request, then KVM could do something like this
> instead of taking kvm->lock (bad pseudocode):
> 
> 	if (test_and_set(sev->guest_request)) {
> 		rc = AEAD_OFLOW;
> 		goto fail;
> 	}
> 
> 	<do request>
> 
> 	clear_bit(...)
> 
> I.e. multiple in-flight requests can't work because the guest can guarantee
> ordering between vCPUs.  But, because the guest can theoretically have up to four
> in-flight requests, it's not that simple.
> 
> The reason I'm going down this path is that taking kvm->lock inside vcpu->mutex
> violates KVM's locking rules, i.e. is susceptibl to deadlocks.  Per kvm/locking.rst,
> 
>    - kvm->lock is taken outside vcpu->mutex
> 
> That means a different mutex is needed to protect the guest request pages.
> 

Ah, I see your point on the locking. From architecturally a guest can 
issue multiple requests in parallel. It sounds like having a separate 
lock to protect the guest request pages makes sense.


-Brijesh