linux-kernel - Re: [PATCH V2 02/12] KVM: x86: Allow the use of kvm_load_host_xsave_state() with guest_state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CABgObfbr4+y57wOiHwZZjv80rE3Bs3MujYY8HvQgDTDoctzpoQ@mail.gmail.com>
Date: Mon, 10 Mar 2025 20:08:11 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Xiaoyao Li <xiaoyao.li@...el.com>, Adrian Hunter <adrian.hunter@...el.com>, 
	kvm <kvm@...r.kernel.org>, Rick Edgecombe <rick.p.edgecombe@...el.com>, 
	Kai Huang <kai.huang@...el.com>, reinette.chatre@...el.com, 
	Tony Lindgren <tony.lindgren@...ux.intel.com>, Binbin Wu <binbin.wu@...ux.intel.com>, 
	David Matlack <dmatlack@...gle.com>, Isaku Yamahata <isaku.yamahata@...el.com>, 
	Nikolay Borisov <nik.borisov@...e.com>, linux-kernel@...r.kernel.org, 
	Yan Zhao <yan.y.zhao@...el.com>, Chao Gao <chao.gao@...el.com>, 
	Weijiang Yang <weijiang.yang@...el.com>
Subject: Re: [PATCH V2 02/12] KVM: x86: Allow the use of kvm_load_host_xsave_state()
 with guest_state_protected

On Sat, Mar 8, 2025 at 12:04 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Thu, Mar 06, 2025, Paolo Bonzini wrote:
> I still absolutely detest carrying dedicated code
> for SEV and TDX state management.  It's bad enough that figuring out WTF actually
> happens basically requires encyclopedic knowledge of massive specs.
>
> I tried to figure out a way to share code, but everything I can come up with that
> doesn't fake vCPU state makes the non-TDX code a mess.  :-(

The only thing worse is requiring encyclopedic knowledge of both the
specs and KVM. :)  And yeah, we do require some knowledge of parts of
KVM
that *shouldn't* matter for protected-state guests, but it shouldn't
be worse than needed.

There's different microcode/firmware for VMX/SVM/SEV-ES+/TDX, the
chance of sharing code is lower and lower as more stuff is added
there---as is the case
for SEV-ES/SNP and TDX. Which is why state management code for TDX is
anyway doing its own thing most of the time---there's no point in
sharing a little bit which is not even the hardest.

> > just so that the common code does the right thing for pkru/xcr0/xss,
>
> FWIW, it's not just to that KVM does the right thing for those values, it's a
> defense in depth mechanism so that *when*, not if, KVM screws up, the odds of the
> bug being fatal to KVM and/or the guest are reduced.

I would say the other way round is true too.  Not relying too much on
fake values in vcpu->arch can be more robust.

> Without actual sanity check and safeguards in the low level helpers, we absolutely
> are playing a game of whack-a-mole.
>
> E.g. see commit 9b42d1e8e4fe ("KVM: x86: Play nice with protected guests in
> complete_hypercall_exit()").
>
> At a glance, kvm_hv_hypercall() is still broken, because is_protmode() will return
> false incorrectly.

So the fixes are needed anyway and we're playing the game anyway. :(

> > And while the change for XSS (and possibly other MSRs) is actually correct,
> > it should be justified for both SEV-ES/SNP and TDX rather than sneaked into
> > the TDX patches.
> >
> > While there could be other flows that consume guest state, they're
> > just as bound to do the wrong thing if vcpu->arch is only guaranteed
> > to be somehow plausible (think anything that for whatever reason uses
> > cpu_role).
>
> But the MMU code is *already* broken.  kvm_init_mmu() => vcpu_to_role_regs().  It
> "works" because the fubar role is never truly consumed.  I'm sure there are more
> examples.

Yes, and there should be at least a WARN_ON_ONCE when it is accessed,
even if we don't completely cull the initialization of cpu_role...
Loading the XSAVE state isn't any different.

I'm okay with placing some values in cr0/cr4 or even xcr0/xss, but do
not wish to use them more than the absolute minimum necessary. And I
would rather not set more than the bare minimum needed in CR4... why
set CR4.PKE for example, if KVM anyway has no business using the guest
PKRU.

Paolo

> > There's no way the existing flows for !guest_state_protected should run _at
> > all_ when the register state is not there. If they do, it's a bug and fixing
> > them is the right thing to do (it may feel like whack-a-mole but isn't)
>
> Eh, it's still whack-a-mole, there just happen to be a finite number of moles :-)