linux-kernel - Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZK7YEUE9lxlvagsv@google.com>
Date:   Wed, 12 Jul 2023 09:42:57 -0700
From:   Sean Christopherson <seanjc@...gle.com>
To:     Weijiang Yang <weijiang.yang@...el.com>
Cc:     pbonzini@...hat.com, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, peterz@...radead.org,
        rppt@...nel.org, binbin.wu@...ux.intel.com,
        rick.p.edgecombe@...el.com, john.allen@....com,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Gil Neiger <gil.neiger@...el.com>
Subject: Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs

On Fri, Jul 07, 2023, Weijiang Yang wrote:
> > Side topic, what on earth does the SDM mean by this?!?
> > 
> >    The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
> >    (hardware requires bits 1:0 to be 0).
> > 
> > I know Intel retroactively changed the alignment requirements, but the above
> > is nonsensical.  If ucode prevents writing bits 2:0, who cares what hardware
> > requires?
> 
> Hi, Sean,
> 
> Regarding the alignment check, I got update from Gil:
> 
> ==================================================
> 
> The WRMSR instruction to load IA32_PL[0-3]_SSP will #GP if the value to be
> loaded sets either bit 0 or bit 1.  It does not check bit 2.
> IDT event delivery, when changing to rings 0-2 will load SSP from the MSR
> corresponding to the new ring.  These transitions check that bits 2:0 of the
> new value are all zero and will generate a nested fault if any of those bits
> are set.  (Far CALL using a call gate also checks this if changing CPL.)
> 
> For a VMM that is emulating a WRMSR by a guest OS (because it was
> intercepting writes to that MSR), it suffices to perform the same checks as
> the CPU would (i.e., only bits 1:0):
> •    If the VMM sees bits 1:0 clear, it can perform the write on the part of
> the guest OS.  If the guest OS later encounters a #GP during IDT event
> delivery (because bit 2 is set), it is its own fault.
> •    If the VMM sets either bit 0 or bit 1 set, it should inject a #GP into
> the guest, as that is what the CPU would do in this case.
> 
> For an OS that is writing to the MSRs to set up shadow stacks, it should
> WRMSR the base addresses of those stacks.  Because of the token-based
> architecture used for supervisor shadow stacks (for rings 0-2), the base
> addresses of those stacks should be 8-byte aligned (clearing bits 2:0). 
> Thus, the values that an OS writes to the corresponding MSRs should clear
> bits 2:0.
> 
> (Of course, most OS’s will use only the MSR for ring 0, as most OS’s do not
> use rings 1 and 2.)
> 
> In contrast, the IA32_PL3_SSP MSR holds the current SSP for user software. 
> When a user thread is created, I suppose it may reference the base of the
> user shadow stack.  For a 32-bit app, that needs to be 4-byte aligned (bits
> 1:0 clear); for a 64-bit app, it may be necessary for it to be 8-byte
> aligned (bits 2:0) clear.
> 
> Once the user thread is executing, the CPU will load IA32_PL3_SSP with the
> user’s value of SSP on every exception and interrupt to ring 0.  The value
> at that time may be 4-byte or 8-byte aligned, depending on how the user
> thread is using the shadow stack.  On context switches, the OS should WRMSR
> whatever value was saved (by RDMSR) the last time there was a context switch
> away from the incoming thread.  The OS should not need to inspect or change
> this value.
> 
> ===================================================
> 
> Based on his feedback, I think VMM needs to check bits 1:0 when write the
> SSP MSRs. Is it?

Yep, KVM should only check bits 1:0 when emulating WRMSR.  KVM doesn't emulate
event delivery except for Real Mode, and I don't see that ever changing.  So to
"handle" the #GP during event delivery case, KVM just needs to propagate the "bad"
value into guest context, which KVM needs to do anyways.

Thanks for following up on this!