linux-kernel - Re: [PATCH v8 1/4] crypto: ccp - Name -1 return value as SEV_RET_NO_FW

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c23f2bf3-71f4-2346-b940-73df1e173ba1@amd.com>
Date:   Tue, 6 Dec 2022 14:36:32 -0600
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Dionna Amalie Glaze <dionnaglaze@...gle.com>,
        Borislav Petkov <bp@...en8.de>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Peter Gonda <pgonda@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Joerg Roedel <jroedel@...e.de>, Ingo Molnar <mingo@...hat.com>,
        Andy Lutomirsky <luto@...nel.org>,
        John Allen <john.allen@....com>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH v8 1/4] crypto: ccp - Name -1 return value as
 SEV_RET_NO_FW_CALL

On 12/5/22 11:05, Dionna Amalie Glaze wrote:
> On Sat, Dec 3, 2022 at 11:37 AM Borislav Petkov <bp@...en8.de> wrote:
>>
>> On Sat, Dec 03, 2022 at 10:58:39AM -0800, Dionna Amalie Glaze wrote:
>>> It doesn't always overwrite psp_ret, such as the initial error checking.
>>> The value remains uninitialized for -ENODEV, -EBUSY, -EINVAL.
>>> Thus *error in __sev_platform_init_locked can be set to uninitialized
>>> memory if psp_ret is not first initialized.
>>
>> Lemme see if I understand it correctly: you wanna signal that all early
>> return cases in __sev_do_cmd_locked() are such that no firmware was
>> called?
>>
>> I.e., everything before the first iowrite into the command buffer?
>>
>> But then the commit message says:
>>
>> "The PSP can return a "firmware error" code of -1 in circumstances where
>> the PSP is not actually called."
>>
>> which is confusing. How can the PSP return something if it wasn't called?
>>
>> Or you mean those cases above where it would fail on some of the checks
>> before issuing a SEV command? I think you do...
>>
>> So I see Tom has ACKed this but I have to ask: is the SEV spec not going
>> to use -1 ever?
>>
> 
> I'll confirm with Tom, since he's changing the GHCB spec for the
> throttling value.

The SEV API error codes are 16-bits in size, so you'll never see a -1.

> 
>> Also, if this behavior is going to be user-visible, where are we
>> documenting it? Especially if nothing in the kernel is looking at
>> that value but only assigning it to a retval which gets looked at by
>> userspace. Especially then this should be documented.
>>
>> Dunno, maybe somewhere in Documentation/x86/amd-memory-encryption.rst or
>> maybe Tom would have a better idea.
>>
> 
> Agreed it should be in both the Linux documentation and the GHCB spec.

Linux documentation, yes, GHCB spec, no.

Thanks,
Tom

> 
>>> That error points to the kernel copy of the user's argument struct,
>>> which the ioctl always copies back. In the case of those error codes
>>> then, without SEV_RET_NO_FW_CALL, user space will get uninitialized
>>> kernel memory.
>>
>> Right, but having a return value which means "firmware wasn't called"
>> sounds weird. Why does userspace care?
>>
> 
> Arguably it shouldn't ever get this value. We're just not very
> selective when we copy back the kernel copy of the ioctl argument.
> In all cases user space should treat the value as undefined, but still
> we don't want to leak uninitialized kernel stack values.
> 
> Host driver: only on platform init, should just see the negative error
> value and not try to interpret the fw_err in the argument.
> Still the data is copied back and therefore should not be
> uninitialized kernel memory.
> Possible name: SEV_RET_UNDEFINED, or a return value -1 anyway with a
> comment that the argument is undefined.
> 
> Guest driver: The host is issuing a guest request on behalf of the
> guest using patch 4/4 of this series.
> The guest is responsible for keeping the sequence number in sync with
> the PSP, so we want to track if the ghcb_hv_call completed
> successfully to know we should continue with the incremented IV.
> Otherwise we run the risk of the sequence numbers getting out of sync
> and we lock down the VMPCK.
> 
> The guest driver actually sets exitinfo2 to  an undocumented 0xff
> initial value just in case.
> =If the host doesn't write back a documented EXIT_INFO_2 value like
> invalid_len or throttled, then the kernel will emit a log with the
> initial value 0xff (or -1 after this patch).
> 
> I've changed it to -1 to name the same kind of error across host and
> guest: the communication with the PSP didn't complete successfully, so
> the "error" value is not from the PSP.
> This value can also get returned to user space during a -ENOTTY result.
> We can call this NO_FW_CALL or UNDEFINED. I have no real preference.
> 
> Whatever value we set initially, the VMM can overwrite exitinfo2
> during the ghcb_hv_call.
> I'd rather that the "undefined" values were the same across both,
> because the guest is merely receiving a value from the host's PSP
> driver (or should be).
> It keeps the enum for return values a bit tidier and not concerned
> with whether the value is viewed from the host or guest.
> 
> I can see an argument for not using the PSP header for its enum type
> and instead defining and documenting and using the separate the 0xff
> value elsewhere, but this seemed as good a place as any.
> 
> 
>> I mean, you can just as well return any of the negative values -ENODEV,
>> -EBUSY, -EINVAL too, depending on where you exit. Having three different
>> retvals could tell you where exactly it failed, even.
>>
> 
> That's true, those values are already being returned to user space as
> the result of the ioctl.
> 
>> But the question remains: why does userspace needs to know that the
>> failure happened and firmware wasn't called, as long as it is getting
>> something negative to signal an error?
>>
> 
> I hope the above discussion is clear that it's purely a defined
> "undefined" because being pickier about what to copy_to_user during
> exceptional circumstances in order to not overwrite the user's fw_err
> value seems an unnecessary amount of code.
> 
>> Thx.
>>
>> --
>> Regards/Gruss,
>>      Boris.
>>
>> https://people.kernel.org/tglx/notes-about-netiquette
> 
> 
>