linux-kernel - Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMnJYWKf63Ay+pIA@AUSJOHALLEN.amd.com>
Date: Tue, 16 Sep 2025 15:33:01 -0500
From: John Allen <john.allen@....com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Tom Lendacky <thomas.lendacky@....com>,
	Mathias Krause <minipli@...ecurity.net>,
	Rick Edgecombe <rick.p.edgecombe@...el.com>,
	Chao Gao <chao.gao@...el.com>, Maxim Levitsky <mlevitsk@...hat.com>,
	Xiaoyao Li <xiaoyao.li@...el.com>
Subject: Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the
 GHCB when it's valid

On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> On Tue, Sep 16, 2025, John Allen wrote:
> > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > Synchronize XSS from the GHCB to KVM's internal tracking if the guest
> > > marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
> > > of XSS in order to compute the required XSTATE size when emulating
> > > CPUID.0xD.0x1 for the guest.
> > > 
> > > Treat the incoming XSS change as an emulated write, i.e. validatate the
> > > guest-provided value, to avoid letting the guest load garbage into KVM's
> > > tracking.  Simply ignore bad values, as either the guest managed to get an
> > > unsupported value into hardware, or the guest is misbehaving and providing
> > > pure garbage.  In either case, KVM can't fix the broken guest.
> > > 
> > > Note, emulating the change as an MSR write also takes care of side effects,
> > > e.g. marking dynamic CPUID bits as dirty.
> > > 
> > > Suggested-by: John Allen <john.allen@....com>
> > > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> > > ---
> > >  arch/x86/kvm/svm/sev.c | 3 +++
> > >  arch/x86/kvm/svm/svm.h | 1 +
> > >  2 files changed, 4 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > >  
> > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > +
> > 
> > It looks like this is the change that caused the selftest regression
> > with sev-es. It's not yet clear to me what the problem is though.
> 
> Do you see any WARNs in the guest kernel log?
> 
> The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> only used by init_xstate_size(), and I would expect the guest kernel's sanity
> checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.

Yes, actually that looks to be the case:

[    0.463504] ------------[ cut here ]------------
[    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
[    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
[    0.466443] Modules linked in:
[    0.467445] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.17.0-rc3-shstk-v15+ #6 PREEMPT(voluntary)
[    0.468443] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[    0.469444] RIP: 0010:paranoid_xstate_size_valid+0x101/0x140
[    0.470443] Code: 89 44 24 04 e8 00 fa ff ff 8b 44 24 04 eb c2 89 da 89 c6 48 c7 c7 80 f4 bc 9e 89 44 24 04 c6 05 9d a3 a4 ff 01 e8 3f fa fb fd <0f> 0b 8b 44 24 04 eb ce 80 3d 8a a3 a4 ff 00 74 09 e8 c9 f9 ff ff
[    0.471443] RSP: 0000:ffffffff9ee03e80 EFLAGS: 00010286
[    0.472443] RAX: 0000000000000000 RBX: 0000000000000348 RCX: c0000000fffeffff
[    0.473443] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd83c00
[    0.474443] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000003
[    0.475443] R10: ffffffff9ee03d20 R11: ffff8c04fff8ffe8 R12: 0000000000000001
[    0.476443] R13: ffffffffffffffff R14: 0000000000000001 R15: 000000007c135000
[    0.477443] FS:  0000000000000000(0000) GS:ffff8c051c118000(0000) knlGS:0000000000000000
[    0.478443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.479443] CR2: ffff8c03f4c01000 CR3: 0008000f73822001 CR4: 0000000000f70ef0
[    0.480445] PKRU: 55555554
[    0.480967] Call Trace:
[    0.481446]  <TASK>
[    0.481856]  init_xstate_size+0xa8/0x160
[    0.482444]  fpu__init_system_xstate+0x1c4/0x500
[    0.483444]  fpu__init_system+0x93/0xc0
[    0.484443]  arch_cpu_finalize_init+0xd2/0x160
[    0.485290]  start_kernel+0x330/0x470
[    0.485444]  x86_64_start_reservations+0x14/0x30
[    0.486443]  x86_64_start_kernel+0xd0/0xe0
[    0.487443]  common_startup_64+0x13e/0x141
[    0.488444]  </TASK>
[    0.488879] ---[ end trace 0000000000000000 ]--

> 
> Another possibility is that unconditionally setting cpuid_dynamic_bits_dirty
> was masking a pre-existing (or just different) bug, and that "fixing" that flaw
> by eliding cpuid_dynamic_bits_dirty when "vcpu->arch.ia32_xss == data" exposed
> the bug.