[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZK7YEUE9lxlvagsv@google.com>
Date: Wed, 12 Jul 2023 09:42:57 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Weijiang Yang <weijiang.yang@...el.com>
Cc: pbonzini@...hat.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, peterz@...radead.org,
rppt@...nel.org, binbin.wu@...ux.intel.com,
rick.p.edgecombe@...el.com, john.allen@....com,
Sean Christopherson <sean.j.christopherson@...el.com>,
Gil Neiger <gil.neiger@...el.com>
Subject: Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs
On Fri, Jul 07, 2023, Weijiang Yang wrote:
> > Side topic, what on earth does the SDM mean by this?!?
> >
> > The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
> > (hardware requires bits 1:0 to be 0).
> >
> > I know Intel retroactively changed the alignment requirements, but the above
> > is nonsensical. If ucode prevents writing bits 2:0, who cares what hardware
> > requires?
>
> Hi, Sean,
>
> Regarding the alignment check, I got update from Gil:
>
> ==================================================
>
> The WRMSR instruction to load IA32_PL[0-3]_SSP will #GP if the value to be
> loaded sets either bit 0 or bit 1. It does not check bit 2.
> IDT event delivery, when changing to rings 0-2 will load SSP from the MSR
> corresponding to the new ring. These transitions check that bits 2:0 of the
> new value are all zero and will generate a nested fault if any of those bits
> are set. (Far CALL using a call gate also checks this if changing CPL.)
>
> For a VMM that is emulating a WRMSR by a guest OS (because it was
> intercepting writes to that MSR), it suffices to perform the same checks as
> the CPU would (i.e., only bits 1:0):
> • If the VMM sees bits 1:0 clear, it can perform the write on the part of
> the guest OS. If the guest OS later encounters a #GP during IDT event
> delivery (because bit 2 is set), it is its own fault.
> • If the VMM sets either bit 0 or bit 1 set, it should inject a #GP into
> the guest, as that is what the CPU would do in this case.
>
> For an OS that is writing to the MSRs to set up shadow stacks, it should
> WRMSR the base addresses of those stacks. Because of the token-based
> architecture used for supervisor shadow stacks (for rings 0-2), the base
> addresses of those stacks should be 8-byte aligned (clearing bits 2:0).
> Thus, the values that an OS writes to the corresponding MSRs should clear
> bits 2:0.
>
> (Of course, most OS’s will use only the MSR for ring 0, as most OS’s do not
> use rings 1 and 2.)
>
> In contrast, the IA32_PL3_SSP MSR holds the current SSP for user software.
> When a user thread is created, I suppose it may reference the base of the
> user shadow stack. For a 32-bit app, that needs to be 4-byte aligned (bits
> 1:0 clear); for a 64-bit app, it may be necessary for it to be 8-byte
> aligned (bits 2:0) clear.
>
> Once the user thread is executing, the CPU will load IA32_PL3_SSP with the
> user’s value of SSP on every exception and interrupt to ring 0. The value
> at that time may be 4-byte or 8-byte aligned, depending on how the user
> thread is using the shadow stack. On context switches, the OS should WRMSR
> whatever value was saved (by RDMSR) the last time there was a context switch
> away from the incoming thread. The OS should not need to inspect or change
> this value.
>
> ===================================================
>
> Based on his feedback, I think VMM needs to check bits 1:0 when write the
> SSP MSRs. Is it?
Yep, KVM should only check bits 1:0 when emulating WRMSR. KVM doesn't emulate
event delivery except for Real Mode, and I don't see that ever changing. So to
"handle" the #GP during event delivery case, KVM just needs to propagate the "bad"
value into guest context, which KVM needs to do anyways.
Thanks for following up on this!
Powered by blists - more mailing lists