linux-kernel - Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5A53CDE0.7030607@ORACLE.COM>
Date:   Mon, 08 Jan 2018 22:00:32 +0200
From:   Liran Alon <LIRAN.ALON@...CLE.COM>
To:     Paolo Bonzini <pbonzini@...hat.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
CC:     jmattson@...gle.com, aliguori@...zon.com, thomas.lendacky@....com,
        dwmw@...zon.co.uk, bp@...en8.de
Subject: Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU



On 08/01/18 20:08, Paolo Bonzini wrote:
> From: Tom Lendacky <thomas.lendacky@....com>
>
> Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
> going to run a VCPU different from what was previously run.  Nested
> virtualization uses the same VMCB for the second level guest, but the
> L1 hypervisor should be using IBRS to protect itself.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@....com>
> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
> ---
>   arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 779981a00d01..dd744d896cec 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -289,6 +289,7 @@ struct amd_svm_iommu_ir {
>   module_param(vgif, int, 0444);
>
>   static bool __read_mostly have_spec_ctrl;
> +static bool __read_mostly have_ibpb_support;
>
>   static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
>   static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
> @@ -540,6 +541,7 @@ struct svm_cpu_data {
>   	struct kvm_ldttss_desc *tss_desc;
>
>   	struct page *save_area;
> +	struct vmcb *current_vmcb;
>   };
>
>   static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> @@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void)
>   		pr_info("kvm: SPEC_CTRL available\n");
>   	else
>   		pr_info("kvm: SPEC_CTRL not available\n");
> +	have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support();
> +	if (have_ibpb_support)
> +		pr_info("kvm: IBPB_SUPPORT available\n");
> +	else
> +		pr_info("kvm: IBPB_SUPPORT not available\n");
>
>   	return 0;
>
> @@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>   	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>   	kvm_vcpu_uninit(vcpu);
>   	kmem_cache_free(kvm_vcpu_cache, svm);
> +
> +	/*
> +	 * The VMCB could be recycled, causing a false negative in
> +	 * svm_vcpu_load; block speculative execution.
> +	 */
> +	if (have_ibpb_support)
> +		wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>   }
>
>   static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>   	int i;
>
>   	if (unlikely(cpu != vcpu->cpu)) {
> @@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   	if (static_cpu_has(X86_FEATURE_RDTSCP))
>   		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>
> +	if (sd->current_vmcb != svm->vmcb) {
> +		sd->current_vmcb = svm->vmcb;
> +		if (have_ibpb_support)
> +			wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
> +	}
> +
>   	avic_vcpu_load(vcpu, cpu);
>   }
>
> @@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
>   	if (!nested_vmcb)
>   		return 1;
>
> +	/*
> +	 * No need for IBPB here, the L1 hypervisor should be running with
> +	 * IBRS=1 and inserts one already when switching L2 VMs.
> +	 */
> +

I am not fully convinced yet this is OK from security perspective.
 From the CPU point-of-view, both L1 & L2 run in the same 
prediction-mode (as they are both running in SVM guest-mode). Therefore, 
IBRS will not hide the BHB & BTB of L1 from L2.

This is how I understand things:
1. When transition between contexts in same prediction-mode, software is 
responsible for issuing an IBPB to basically "delete" the "branch 
prediction data" of the previous context such that the new context won't 
be able to use it.
This is orthogonal to IBRS which makes sure new context won't use 
"branch prediction data" that was created by a previous less-privileged 
prediction-mode as we are talking about transitioning between 2 contexts 
in same physical prediction-mode.
2. Because of (1), KVM was modified to issue IBPB when replacing active 
VMCB/VMCS on CPU.
3. For the nested-virtualization case, L1 & L2 both run in same physical 
prediction-mode. As from physical CPU perspective, they are both running 
in SVM guest-mode. Therefore, L0 should issue IBPB when transitioning 
from L1 to L2 and vice-versa.
4. In nested VMX case, this already happens because transitioning 
between L1 & L2 involves changing active VMCS on CPU (from vmcs01 to 
vmcs02) which already issues an IBPB.
5. However, nested SVM is reusing the same VMCB when transitioning 
between L1 & L2 and therefore we should explicitly issue an IBPB in 
nested_svm_vmrun() & nested_svm_vmexit().
6. One can implicitly assume that L1 hypervisor did issued a IBPB when 
loading an VMCB and therefore it doesn't have to do it again in L0.
However, I see 2 problems with this approach:
(a) We don't protect old non-patched L1 hypervisors from Spectre even 
though we could easily do that from L0 with small performance hit.
(b) When L1 hypervisor runs only a single L2 VM, it doesn't have to 
issue an IBPB when loading the VMCB (as it didn't run any other VMCB 
before) and it should be sufficient from L1 perspective to just write 1 
to IBRS. However, in our nested-case, this is a security-hole.

Am I misunderstanding something?

Regards,
-Liran

>   	/* Exit Guest-Mode */
>   	leave_guest_mode(&svm->vcpu);
>   	svm->nested.vmcb = 0;
> @@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
>   	if (!nested_vmcb)
>   		return false;
>
> +	/*
> +	 * No need for IBPB here, since the nested VM is less privileged.  The
> +	 * L1 hypervisor inserts one already when switching L2 VMs.
> +	 */
> +
>   	if (!nested_vmcb_checks(nested_vmcb)) {
>   		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
>   		nested_vmcb->control.exit_code_hi = 0;
>