lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f5af820-5ccf-92e6-1acd-b87aef9885e6@amd.com>
Date: Mon, 23 Jun 2025 14:50:07 -0500
From: Tom Lendacky <thomas.lendacky@....com>
To: Sean Christopherson <seanjc@...gle.com>,
 Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Jim Mattson <jmattson@...gle.com>,
 Maxim Levitsky <mlevitsk@...hat.com>, Vitaly Kuznetsov
 <vkuznets@...hat.com>, Rik van Riel <riel@...riel.com>, x86@...nel.org,
 kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of
 ASID->VMCB

On 6/20/25 18:13, Sean Christopherson wrote:
> On Wed, Mar 26, 2025, Yosry Ahmed wrote:
>> SEV currently tracks the ASID to VMCB mapping for each physical CPU.
>> This is required to flush the ASID when a new VMCB using the same ASID
>> is run on the same CPU. Practically, there is a single VMCB for each
>> vCPU using SEV. Furthermore, TLB flushes on nested transitions between
>> VMCB01 and VMCB02 are handled separately (see
>> nested_svm_transition_tlb_flush()).
>>
>> In preparation for generalizing the tracking and making the tracking
>> more expensive, start tracking the ASID to vCPU mapping instead. This
>> will allow for the tracking to be moved to a cheaper code path when
>> vCPUs are switched.
>>
>> Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
>> ---
>>  arch/x86/kvm/svm/sev.c | 12 ++++++------
>>  arch/x86/kvm/svm/svm.c |  2 +-
>>  arch/x86/kvm/svm/svm.h |  4 ++--
>>  3 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index d613f81addf1c..ddb4d5b211ed7 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -240,7 +240,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
>>  
>>  	for_each_possible_cpu(cpu) {
>>  		sd = per_cpu_ptr(&svm_data, cpu);
>> -		sd->sev_vmcbs[sev->asid] = NULL;
>> +		sd->sev_vcpus[sev->asid] = NULL;
>>  	}
>>  
>>  	mutex_unlock(&sev_bitmap_lock);
>> @@ -3081,8 +3081,8 @@ int sev_cpu_init(struct svm_cpu_data *sd)
>>  	if (!sev_enabled)
>>  		return 0;
>>  
>> -	sd->sev_vmcbs = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
>> -	if (!sd->sev_vmcbs)
>> +	sd->sev_vcpus = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
>> +	if (!sd->sev_vcpus)
>>  		return -ENOMEM;
>>  
>>  	return 0;
>> @@ -3471,14 +3471,14 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
>>  	/*
>>  	 * Flush guest TLB:
>>  	 *
>> -	 * 1) when different VMCB for the same ASID is to be run on the same host CPU.
>> +	 * 1) when different vCPU for the same ASID is to be run on the same host CPU.
> 
> Tom, can you clarity what an ASID actually tags when NPT is in use?

I ran your questions by David Kaplan and hopefully his responses will
helps clear things up.

> 
> The more I think about all of this, the less it makes sense.  The *entire* point
> of an ASID is to tag TLB entries so that a flush isn't required when running code
> for the same address space.
> 
> The main problem I'm struggling with is that, as usual, the APM doesn't properly
> document anything, and just gives "suggestions" for the VMM.  *sigh*
> 
> As I read it, these snippets from the APM are saying ASIDs tag only GPA=>PA entries
> when NPT is in use.
> 
>   TLB entries are tagged with Address Space Identifier (ASID) bits to distinguish
>   different guest virtual address spaces when shadow page tables are used, or
>   different guest physical address spaces when nested page tables are used. The
>   VMM can choose a software strategy in which it keeps multiple shadow page tables,
>   and/or multiple nested page tables in processors that support nested paging,
>   up-to-date; the VMM can allocate a different ASID for each shadow or nested
>   page table. This allows switching to a new process in a guest under shadow
>   paging (changing CR3 contents), or to a new guest under nested paging (changing
>   nCR3 contents), without flushing the TLBs.
> 
>   Note that because an ASID is associated with the guest's physical address
>   space, it is common across all of the guest's virtual address spaces within a
>   processor. This differs from shadow page tables where ASIDs tag individual
>   guest virtual address spaces. Note also that the same ASID may or may not be
>   associated with the same address space across all processors in a
>   multiprocessor system, for either nested tables or shadow tables; this depends
>   on how the VMM manages ASID assignment.
> 
> But then the "15.16.1 TLB Flush" section says this, without any qualification
> whatsoever that it applies only to shadow paging.
> 
>   A MOV-to-CR3, a task switch that changes CR3, or clearing or setting CR0.PG or
>   bits PGE, PAE, PSE of CR4 affects only the TLB entries belonging to the current
>   ASID, regardless of whether the operation occurred in host or guest mode. The
>   current ASID is 0 when the CPU is not inside a guest context.
> 
> And honestly, tagging only GPA=>PA entries doesn't make any sense, because
> GVA=>GPA needs to be tagged with *something*.  And the APM doesn't say anything
> about caching GPA=>PA translations, only about caching VA=>PA.

VA=>PA translations are always tagged with a TLB tag value.  Outside of
SEV-SNP, the TLB tag value is ASID.

So for those guests, VA=>PA translation are tagged with the ASID.  For
SEV-SNP guests, see below.

> 
> The thing that doesn't fit is that SEV+ uses ASIDs on a per-VM basis.  I suggested
> per-VM ASIDs for all VM types based solely on that fact, but now I'm wondering if
> it's SEV+ that crazy and broken.  Because if ASIDs also tag GVA=>GPA, then SEV has
> a massive architectural security hole, e.g. a malicious hypervisor can coerce the
> CPU into using a stale GVA=>GPA TLB entry by switching vCPUs and letting guest
> process with CR3=x access memory for guest process with CR3=y.  But again, if
> ASIDs don't tag GVA=>GPA, then what provides isolation between vCPUs!?!?!

No.

For SEV/SEV-ES guests, the HV (which remains partially trusted) must do a
TLB flush before running a different VMCB of the same guest, in order to
avoid this problem. This code is in pre_sev_run().

For SEV-SNP guests, this is handled automatically by hardware through the
PCPU_ID and TLB_ID VMSA fields (documented somewhat in APM 15.36.15).

In short, the TLB is tagged with {TLB_ID, ASID} and TLB_ID is managed by
HW and guaranteed to be different for each vCPU of the guest running on a
physical core. This ensures that the TLB tag is unique for each guest and
for each vCPU of the guest.

> 
> Assuming ASIDs tag VA=>PA (i.e. the combined GVA=>GPA=>PA translation), then I
> take back my suggestion to use per-VM ASIDs, and instead propose we treat ASIDs
> like VPIDs, i.e. use per-vCPU ASIDs (well, technically per-VMCB, because nested).
> All server SKUs I've checked (Rome, Milan, Genoa, and Turin) support 0x8000 ASIDs.
> That makes the ~512 ASIDs reserved for SEV+ a drop in the bucket, i.e. still leaves
> ~32k ASIDs up for grabs, which means KVM can concurrently run ~16k vCPUs *with*
> nested VMs without having to fallback to flushing when scheduling in a new vCPU.
> 
> That way we don't need the complexity of the xarray ASID=>vCPU tracking for common
> code.  And as a bonus, the logic for VMX vs. SVM is very, very similar.
> 
> Given the SEV+ uses the ASID as the handle for the guest's encryption key,
> changing that behavior isn't an option.  Though I still don't see how that isn't
> a security flaw.  Regardless, I definitely don't think it's something we should
> follow.

I don't object to the above, there are plenty of ASIDs. Per-VMCB ASIDs
seems fine. I suspect that the per-pCPU scoping of ASIDs likely dates back
to the era when there weren't many ASIDs. But now that everything supports
32k, that's not an issue.

Note that SEV reserves 1006 ASIDs in Genoa/later, not 512.

Thanks,
Tom


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ