[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzTMlcoU4uOijVxt@google.com>
Date: Wed, 13 Nov 2024 07:58:13 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Dionna Glaze <dionnaglaze@...gle.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
Ashish Kalra <ashish.kalra@....com>, Tom Lendacky <thomas.lendacky@....com>,
John Allen <john.allen@....com>, Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>, linux-coco@...ts.linux.dev,
Paolo Bonzini <pbonzini@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
Michael Roth <michael.roth@....com>, Luis Chamberlain <mcgrof@...nel.org>,
Russ Weight <russ.weight@...ux.dev>, Danilo Krummrich <dakr@...hat.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Rafael J. Wysocki" <rafael@...nel.org>,
Tianfei zhang <tianfei.zhang@...el.com>, Alexey Kardashevskiy <aik@....com>, linux-crypto@...r.kernel.org
Subject: Re: [PATCH v6 5/8] crypto: ccp: Add GCTX API to track ASID assignment
On Tue, Nov 12, 2024, Dionna Glaze wrote:
> @@ -109,6 +110,10 @@ static void *sev_init_ex_buffer;
> */
> static struct sev_data_range_list *snp_range_list;
>
> +/* SEV ASID data tracks resources associated with an ASID to safely manage operations. */
> +struct sev_asid_data *sev_asid_data;
> +u32 nr_asids, sev_min_asid, sev_es_max_asid;
> +
> static inline bool sev_version_greater_or_equal(u8 maj, u8 min)
> {
> struct sev_device *sev = psp_master->sev_data;
> @@ -1093,6 +1098,109 @@ static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg)
> return 0;
> }
>
> +static bool sev_check_external_user(int fd);
> +void *sev_snp_create_context(int fd, int asid, int *psp_ret)
> +{
> + struct sev_data_snp_addr data = {};
> + void *context;
> + int rc, error;
> +
> + if (!sev_check_external_user(fd))
> + return ERR_PTR(-EBADF);
> +
> + if (!sev_asid_data)
> + return ERR_PTR(-ENODEV);
> +
> + if (asid < 0 || asid >= nr_asids)
> + return ERR_PTR(-EINVAL);
> +
> + /* Can't create a context for a used ASID. */
> + if (WARN_ON_ONCE(sev_asid_data[asid].snp_context))
> + return ERR_PTR(-EBUSY);
Tracking contexts in an array that's indexed per ASID is unsafe and unnecessarily
splits ASID management across KVM and the PSP driver. There is zero reason the
PSP driver needs to care about ASIDs. Attempting to police KVM is futile, and
leads to bloated, convoluted code.
AFAICT, there is nothing to guard against a use-after-free in
snp_update_guest_contexts(). The need to handle SEV_RET_INVALID_GUEST is a pretty
big clue that there are races between KVM and firmware updates.
if (!sev_asid_data[i].snp_context)
continue;
status_data.gctx_paddr = __psp_pa(sev_asid_data[i].snp_context);
status_data.address = __psp_pa(snp_guest_status);
rc = sev_do_cmd(SEV_CMD_SNP_GUEST_STATUS, &status_data, &error);
if (!rc)
continue;
/*
* Handle race with SNP VM being destroyed/decommissoned,
* if guest context page invalid error is returned,
* assume guest has been destroyed.
*/
if (error == SEV_RET_INVALID_GUEST)
continue;
Using an array is also inefficient, as it requires iterating over all possible
ASIDs, many of which may be unused.
Furthermore, handling this in the PSP driver (correctly) leads to unnecessary
locking. KVM already protects SNP ASID allocations with sev_deactivate_lock, I
see zero reason to complicate things with another lock.
The "rollback" mechanism is also broken. If SEV_CMD_SNP_GUEST_STATUS fails,
synthetic_restore_required is set and never cleared, and impacts *all* SEV PSP
commands. I.e. failure to update one guest context comletely cripples the entire
system. Not to mention synthetic_restore_required also lacks any form of SMP
synchronication.
I also don't see how a rollback is possible if an error occurs after one or more
guest contexts have been updated. Presumably trying to rollback in that state
will leave the updated guests in a bad state. Of course, I don't see any rollback
code as nothing ever clears synthetic_restore_required, so what's intented to
happen is entirely unclear.
I also don't see anything in this series that explains why a SEV_CMD_SNP_GUEST_STATUS
failure shouldn't be treated as a fatal error. Of the error codes listed in the
SNP ABI, everything except UPDATE_FAILED is clearly a software bug. And I can't
find anything that explains when UPDATE_FAILED will be returned.
Table 80. Status Codes for SNP_GUEST_STATUS
Status Condition
SUCCESS Successful completion.
INVALID_PLATFORM_STATE The platform is not in the INIT state.
INVALID_ADDRESS The address is invalid for use by the firmware.
INVALID_PARAM MBZ fields are not zero.
INVALID_GUEST The guest context page was invalid.
INVALID_PAGE_STATE The guest status page was not in the correct state.
INVALID_PAGE_SIZE The guest status page was not the correct size.
UPDATE_FAILED Update of the firmware internal state or a guest context page has failed.
Somewhat off the cuff, I think the only sane way to approach this is to call into
KVM when doing a firmware update, and let KVM react accordingly. E.g. let KVM
walk its list of VMs in order to update SNP VMs, taking kvm_lock and the somewhat
misnamed sev_deactivate_lock() as needed. Then if updating a guest context fails,
terminate _that_ VM, and move on to the next VM.
Side topic, I don't see any code that ensures no SEV or SEV-ES VMs are running.
Is the idea to let userspace throw noodles at the PSP and see what sticks?
+ Provide support for AMD Platform Security Processor firmware.
+ The PSP firmware can be updated while no SEV or SEV-ES VMs are active.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ Users of this feature should be aware of the error modes that indicate
+ required manual rollback or reset due to instablity.
Powered by blists - more mailing lists