linux-kernel - Re: [PATCH] virt: Prevent AES-GCM IV reuse in SNP guest driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5621c2b6-a5eb-c786-afee-020e97c0e4c8@amd.com>
Date:   Wed, 19 Oct 2022 14:56:22 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Dionna Amalie Glaze <dionnaglaze@...gle.com>
Cc:     Peter Gonda <pgonda@...gle.com>, Borislav Petkov <bp@...e.de>,
        Michael Roth <michael.roth@....com>,
        Haowen Bai <baihaowen@...zu.com>,
        Yang Yingliang <yangyingliang@...wei.com>,
        Marc Orr <marcorr@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Ashish Kalra <Ashish.Kalra@....com>,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH] virt: Prevent AES-GCM IV reuse in SNP guest driver

On 10/19/22 14:17, Dionna Amalie Glaze wrote:
> On Wed, Oct 19, 2022 at 11:44 AM Tom Lendacky <thomas.lendacky@....com> wrote:
>>
>> On 10/19/22 12:40, Peter Gonda wrote:
>>> On Wed, Oct 19, 2022 at 11:03 AM Tom Lendacky <thomas.lendacky@....com> wrote:
>>>>
>>>> On 10/19/22 10:03, Peter Gonda wrote:
>>>>> The ASP and an SNP guest use a series of AES-GCM keys called VMPCKs to
>>>>> communicate securely with each other. The IV to this scheme is a
>>>>> sequence number that both the ASP and the guest track. Currently this
>>>>> sequence number in a guest request must exactly match the sequence
>>>>> number tracked by the ASP. This means that if the guest sees an error
>>>>> from the host during a request it can only retry that exact request or
>>>>> disable the VMPCK to prevent an IV reuse. AES-GCM cannot tolerate IV
>>>>> reuse see:
>>>>> https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents/bcm/comments/800-38-series-drafts/gcm/joux_comments.pdf
>>>>
>>
>> I think I've wrapped my head around this now. Any non-zero return code
>> from the hypervisor for an SNP Guest Request is either a hypervisor error
>> or an sev-guest driver error, and so the VMPCK should be disabled. The
>> sev-guest driver is really doing everything (message headers, performing
>> the encryption, etc.) and is only using userspace data that will be part
>> of the response message and can't result in a non-zero hypervisor return code.
>>
>> For the SNP Extended Guest Request, we only need to special case a return
>> code of SNP_GUEST_REQ_INVALID_LEN. See below for my responses on that.
>>
>>
>>>> I wonder if we can at least still support the extended report length query
>>>> by having the kernel allocate the required pages when the error is
>>>> SNP_GUEST_REQ_INVALID_LEN and retry the exact request again. If there are
>>>> no errors on the second request, the sequence numbers can be safely
>>>> updated, but the kernel returns the original error (which will provide the
>>>> caller with the number of pages required).
>>>
>>> I think we can but I thought fixing the security bug could come first,
>>> then the usability fix after. Dionna was planning on working on that
>>> fix.
>>>
>>> In that flow how does userspace get the data? Its called the ioctl
>>> with not enough output buffer space. What if the userspace calls the
>>> ioctl with no buffers space allocated, so its trying to query the
>>> length. We just send the host the request without any encrypted data.
>>
>> In the case of SNP_GUEST_REQ_INVALID_LEN, userspace wouldn't get the data
>> if it hasn't supplied enough buffer space. But, the sev-guest driver can
>> supply enough buffer space and invoke the SNP Extended Guest Request again
>> in order to successfully complete the call and update the sequence
>> numbers. The sev-guest driver would just discard the data in this case,
>> but pass back the original "not enough buffer space" error to the caller,
>> who could now allocate space and retry. This then allows the sequence
>> numbers to be bumped properly.
>>
> 
> The way I thought to solve this was to make certificate length
> querying a part of the specified protocol.
> 
> The first ext_guest_request command /must/ query the certificate
> buffer length with req.certs_len == 0.

This becomes an incompatible change to the GHCB specification.

> By making this part of the protocol, the sev-guest driver can check if
> the certificate length has been requested before.
> If so, emulate the host's VMM error code for invalid length without
> sending an encrypted message.

On the hypervisor side, the certificate blob can be replaced at any time 
with a new blob that is larger. So you may still have to handle the case 
where you get a SNP_GUEST_REQ_INVALID_LEN even if you previously asked before.

> If not, then send an all zeroes request buffer with the req.certs_len
> = 0 values to the VMM.
> 
> The VMM will respond with the size if indeed the expected_pages are >
> 0. In the case that the host has not set the certificate buffer yet,
> then the host will inspect the header of the request page for a zero
> sequence number. If so, then we know that we don't have a valid
> request. We treat this also as the INVALID_LEN case but still return
> the size of 0. The driver will have the expected pages value stored as
> 0 at this point, so subsequent calls will not have this behavior.
> 
> The way /dev/sev-guest user code has been written, I don't think this
> will break any existing software package.

I think having the sev-guest driver re-issue the request with the internal 
buffer when it receives SNP_GUEST_REQ_INVALID_LEN is the better way to go. 
You could still cache the size request and always return that to 
user-space when a request is received with a 0 length. The user-space 
program must be able to handle receiving multiple 
SNP_GUEST_REQ_INVALID_LEN in succession anyway, because of the fact that 
the hypervisor can be updating the certs asynchronously. And if you get a 
request that is not 0 length, then you issue it as such and re-use the 
logic of the first 0 length request that was received if you get an 
SNP_GUEST_REQ_INVALID_LEN with the user-space supplied value.

Peter, is this something you could change the patch to do?

> 
>>>
>>>>
>>>> For the rate-limiting patch series [1], the rate-limiting will have to be
>>>> performed within the kernel, while the mutex is held, and then retry the
>>>> exact request again. Otherwise, that error will require disabling the
>>>> VMPCK. Either that, or the hypervisor must provide the rate limiting.
>>>>
>>>> Thoughts?
>>>>
>>>> [1] https://lore.kernel.org/lkml/20221013160040.2858732-1-dionnaglaze@google.com/
>>>
>>> Yes I think if the host rate limits the guest. The guest kernel should
>>> retry the exact message. Which mutex are you referring too?
>>
>> Or the host waits and then submits the request and the guest kernel
>> doesn't have to do anything. The mutex I'm referring to is the
>> snp_cmd_mutex that is taken in snp_guest_ioctl().
> 
> I think that either the host kernel or guest kernel waiting can lead
> to unacceptable delays.
> I would recommend that we add a zero argument ioctl to /dev/sev-guest
> specifically for retrying the last request.
> 
> We can know what the last request is due to the sev_cmd_mutex serialization.
> The driver will just keep a scratch buffer for this. Any other request
> that comes in without resolving the retry will get an -EBUSY error
> code.

And the first caller will have received an -EAGAIN in order to 
differentiate between the two situations?

> 
> Calling the retry ioctl without a pending command will result in -EINVAL.
> 
> Let me know what you think.

I think that sounds reasonable, but there are some catches. You will need 
to ensure that the caller that is supposed to retry does actually retry 
and that a caller that does retry is the same caller that was told to retry.

Thanks,
Tom

>>
>> Thanks,
>> Tom
> 
> 
>