linux-kernel - Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YemfC17ZJyR0CLYr@google.com>
Date:   Thu, 20 Jan 2022 17:42:35 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Andrew Cooper <amc96@...f.net>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        David Woodhouse <dwmw2@...radead.org>,
        Alexander Graf <graf@...zon.de>,
        Andrew Cooper <andrew.cooper3@...rix.com>
Subject: Re: [PATCH] KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS
 blocking shadow

On Thu, Jan 20, 2022, Andrew Cooper wrote:
> On 20/01/2022 16:36, Sean Christopherson wrote:
> > On Thu, Jan 20, 2022, Andrew Cooper wrote:
> >> On 20/01/2022 00:06, Sean Christopherson wrote:
> >>> MOVSS blocking can be initiated by userspace, but can be coincident with
> >>> a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
> >>> executed in the MOVSS shadow.  MOV DR #GPs at CPL>0, thus MOVSS blocking
> >>> is problematic only for CPL0 (and only if the guest is crazy enough to
> >>> access a DR in a MOVSS shadow).  All other sources of #DBs are either
> >>> suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
> >> It is more complicated than this and undocumented.  Single step is
> >> discard in a shadow, while data breakpoints are deferred.
> > But for the purposes of making the consitency check happy, whether they are
> > deferred or dropped should be irrelevant, no?
> 
> From that point of view, yes.  The consistency check is specific to TS. 
> I suppose I was mostly questioning the wording of the explanation.
> 
> >>> are mutually exclusive with MOVSS blocking (T-bit task switch),
> >> Howso?  MovSS prevents external interrupts from triggering task
> >> switches, but instruction sources still trigger in a shadow.
> > T-bit #DBs are traps, and arrive after the task switch has completed.  The switch
> > can be initiated in the shadow, but the #DB will be delivered after the instruction
> > retires and so after MOVSS blocking goes away.  Or am I missing something?
> 
> Well - this is where the pipeline RTL is needed, in lieu of anything
> better.  Trap-style #DBs are part of the current instruction, and
> specifically ahead (in the instruction cycle) of the subsequent intchk.

And T-bit traps in particular have crazy high priority...

> There are implementations where NMI/INTR/etc won't be delivered at the
> head of an exception generated in a shadow, which would suggest that
> these implementations have the falling edge of the shadow after intchk
> on the instruction boundary.  (Probably certainly what happens is that
> intchk is responsible for clearing the shadow, but this is entirely
> guesswork on my behalf.)

Well, thankfully hardware's behavior should be moot for VM-Entry since task switches
unconditionally VM-Exit, and KVM has a big fat TODO for handling the T-bit.

> >> and splitlock which is new since I last thought about this problem.
> > Eww.  Split Lock is trap-like, which begs the question of what happens if the
> > MOV/POP SS splits a cache line when loading the source data.  I'm guess it's
> > suppressed, a la data breakpoints, but that'd be a fun one to test.
> 
> They're both reads of their memory operand, so aren't eligible to be
> locked accesses.

Hah, right, the "lock" part of "split lock" is just a minor detail...

> However, a devious kernel can misalign the GDT/LDT such that setting the
> descriptor access bit does trigger a splitlock.  I suppose "kernel
> doesn't misalign structures", or "kernel doesn't write a descriptor with
> the access bit clear" are both valid mitigations.
> 
> >>> This bug was originally found by running tests[1] created for XSA-308[2].
> >>> Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
> >>> presumably why the Xen bug was deemed to be an exploitable DOS from guest
> >>> userspace.
> >> As I recall, the original report to the security team was something
> >> along the lines of "Steam has just updated game, and now when I start
> >> it, the VM explodes".
> > Lovely.  I wonder if the game added some form of anti-cheat?  I don't suppose you
> > have disassembly from the report?  I'm super curious what on earth a game would
> > do to trigger this.
> 
> Anti-cheat was my guess too, but no disassembly happened.
> 
> I was already aware of the STI issue, and had posted
> https://lore.kernel.org/xen-devel/1528120755-17455-11-git-send-email-andrew.cooper3@citrix.com/
> more than a year previously.  The security report showed ICEBP pending
> in the INTR_INFO field, and extending the STI test case in light of this
> was all of 30s of work to get a working repro.
> 
> ~Andrew