linux-kernel - Re: [PATCH] x86/speculation, KVM: respect user IBPB configuration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yl2RnIjUTfQ0Avc9@google.com>
Date:   Mon, 18 Apr 2022 16:28:12 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Jon Kohler <jon@...anix.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Balbir Singh <sblbir@...zon.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Kim Phillips <kim.phillips@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        "kvm @ vger . kernel . org" <kvm@...r.kernel.org>,
        Kees Cook <keescook@...omium.org>,
        Waiman Long <longman@...hat.com>,
        Bijan Mottahedeh <bijan.mottahedeh@...anix.com>
Subject: Re: [PATCH] x86/speculation, KVM: respect user IBPB configuration

On Fri, Apr 15, 2022, Jon Kohler wrote:
> 
> > On Apr 15, 2022, at 10:28 AM, Sean Christopherson <seanjc@...gle.com> wrote:
> > But stepping back, why does KVM do its own IBPB in the first place?  The goal is
> > to prevent one vCPU from attacking the next vCPU run on the same pCPU.  But unless
> > userspace is running multiple VMs in the same process/mm_struct, switching vCPUs,
> > i.e. switching tasks, will also switch mm_structs and thus do IPBP via cond_mitigation.
> 
> Good question, I couldn’t figure out the answer to this by walking the code and looking
> at git history/blame for this area. Are there VMMs that even run multiple VMs within
> the same process? The only case I could think of is a nested situation?

Selftests? :-)

> > If userspace runs multiple VMs in the same process, enables cond_ipbp, _and_ sets
> > TIF_SPEC_IB, then it's being stupid and isn't getting full protection in any case,
> > e.g. if userspace is handling an exit-to-userspace condition for two vCPUs from
> > different VMs, then the kernel could switch between those two vCPUs' tasks without
> > bouncing through KVM and thus without doing KVM's IBPB.
> 
> Exactly, so meaning that the only time this would make sense is for some sort of nested
> situation or some other funky VMM tomfoolery, but that nested hypervisor might not be 
> KVM, so it's a farce, yea? Meaning that even in that case, there is zero guarantee
> from the host kernel perspective that barriers within that process are being issued on
> switch, which would make this security posture just window dressing?
> 
> > 
> > I can kinda see doing this for always_ibpb, e.g. if userspace is unaware of spectre
> > and is naively running multiple VMs in the same process.
> 
> Agreed. I’ve thought of always_ibpb as "paranoid mode" and if a user signs up for that,
> they rarely care about the fast path / performance implications, even if some of the
> security surface area is just complete window dressing :( 
> 
> Looking forward, what if we simplified this to have KVM issue barriers IFF always_ibpb?
> 
> And drop the cond’s, since the switching mm_structs should take care of that?
> 
> The nice part is that then the cond_mitigation() path handles the going to thread
> with flag or going from a thread with flag situation gracefully, and we don’t need to
> try to duplicate that smarts in kvm code or somewhere else.

Unless there's an edge case we're overlooking, that has my vote.  And if the
above is captured in a comment, then there shouldn't be any confusion as to why
the kernel/KVM is consuming a flag named "switch_mm" when switching vCPUs, i.e.
when there may or may not have been a change in mm structs.