linux-kernel - Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <C85C63A5-D5F5-4E5D-B516-BD27FC56D06D@nutanix.com>
Date:   Thu, 12 May 2022 20:31:27 +0000
From:   Jon Kohler <jon@...anix.com>
To:     Sean Christopherson <seanjc@...gle.com>
CC:     Jon Kohler <jon@...anix.com>, Jonathan Corbet <corbet@....net>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        Kees Cook <keescook@...omium.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Kim Phillips <kim.phillips@....com>,
        Lukas Bulwahn <lukas.bulwahn@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ashok Raj <ashok.raj@...el.com>,
        KarimAllah Ahmed <karahmed@...zon.de>,
        David Woodhouse <dwmw@...zon.co.uk>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "kvm @ vger . kernel . org" <kvm@...r.kernel.org>,
        Waiman Long <longman@...hat.com>
Subject: Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load



> On May 12, 2022, at 4:07 PM, Sean Christopherson <seanjc@...gle.com> wrote:
> 
> On Thu, May 12, 2022, Jon Kohler wrote:
>> 
>> 
>>> On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@...gle.com> wrote:
>>> 
>>> On Thu, May 12, 2022, Sean Christopherson wrote:
>>>> On Thu, May 12, 2022, Jon Kohler wrote:
>>>>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
>>>>> attack surface is already covered by switch_mm_irqs_off() ->
>>>>> cond_mitigation().
>>>>> 
>>>>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
>>>>> wrong in its guest-to-guest design intention. There are three scenarios
>>>>> at play here:
>>>> 
>>>> Jim pointed offline that there's a case we didn't consider.  When switching between
>>>> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
>>>> different security domains.  E.g. the guest will not get a notification that vCPU0 is
>>>> being swapped out for vCPU1 on a single pCPU.
>>>> 
>>>> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
>>>> most definitely needs to be updated.
>>>> 
>>>> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
>>>> use cases where a single VM is running a single workload.
>>> 
>>> Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
>>> because then the IBPB is fully redundant with respect to any IBPB performed by
>>> switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
>>> because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
>>> 
>>> That would also sidestep the largely theoretical question of whether vCPUs from
>>> different VMs but the same address space are in the same security domain.  It doesn't
>>> matter, because even if they are in the same domain, KVM still needs to do IBPB.
>> 
>> So should we go back to the earlier approach where we have it be only 
>> IBPB on always_ibpb? Or what?
>> 
>> At minimum, we need to fix the unilateral-ness of all of this :) since we’re
>> IBPB’ing even when the user did not explicitly tell us to.
> 
> I think we need separate controls for the guest.  E.g. if the userspace VMM is
> sufficiently hardened then it can run without "do IBPB" flag, but that doesn't
> mean that the entire guest it's running is sufficiently hardened.

What if we keyed off MSR bitmap, such that if a guest *ever* issued IBPB, KVM
can do IBPB on switch? We already disable interception today, so we have the
data, just like we do for SPEC_CTRL.

    if (prev != vmx->loaded_vmcs->vmcs) {
        per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
        vmcs_load(vmx->loaded_vmcs->vmcs);

        /*
         * No indirect branch prediction barrier needed when switching
         * the active VMCS within a guest, e.g. on nested VM-Enter.
         * The L1 VMM can protect itself with retpolines, IBPB or IBRS.
         * We'll only issue this IBPB if the guest itself has ever issued
         * an IBPB, which would indicate they care about prediction barriers
         * on one or more task(s) within the guest. This guards against the
         * scenario where the guest has separate security domains on separate
         * vCPUs, and the kernel switches vCPU-x out for vCPU-y on the same
         * pCPU, before the guest has the chance to issue its own barrier.
         * In this scenario, the switch_mm() -> cond_mitigation would not
         * issue its own barrier, because the vCPUs are sharing a mm_struct.
         */
        if ((!buddy || WARN_ON_ONCE(buddy->vmcs != prev)) &&
            !msr_write_intercepted(vmx, MSR_IA32_PRED_CMD))
            indirect_branch_prediction_barrier()
    }

If the guest isn’t ever issuing IBPB, they one could say that they do not care
about vCPU-to-vCPU attack surface.

Thoughts?

> 
>> That said, since I just re-read the documentation today, it does specifically
>> suggest that if the guest wants to protect *itself* it should turn on IBPB or
>> STIBP (or other mitigations galore), so I think we end up having to think
>> about what our “contract” is with users who host their workloads on
>> KVM - are they expecting us to protect them in any/all cases?
>> 
>> Said another way, the internal guest areas of concern aren’t something
>> the kernel would always be able to A) identify far in advance and B)
>> always solve on the users behalf. There is an argument to be made
>> that the guest needs to deal with its own house, yea?
> 
> The issue is that the guest won't get a notification if vCPU0 is replaced with
> vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit
> IBPB.  Since the host doesn't know whether or not the guest wants IBPB, unless the
> owner of the host is also the owner of the guest workload, the safe approach is to
> assume the guest is vulnerable.