[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMnJYWKf63Ay+pIA@AUSJOHALLEN.amd.com>
Date: Tue, 16 Sep 2025 15:33:01 -0500
From: John Allen <john.allen@....com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org,
Tom Lendacky <thomas.lendacky@....com>,
Mathias Krause <minipli@...ecurity.net>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Chao Gao <chao.gao@...el.com>, Maxim Levitsky <mlevitsk@...hat.com>,
Xiaoyao Li <xiaoyao.li@...el.com>
Subject: Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the
GHCB when it's valid
On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> On Tue, Sep 16, 2025, John Allen wrote:
> > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > Synchronize XSS from the GHCB to KVM's internal tracking if the guest
> > > marks XSS as valid on a #VMGEXIT. Like XCR0, KVM needs an up-to-date copy
> > > of XSS in order to compute the required XSTATE size when emulating
> > > CPUID.0xD.0x1 for the guest.
> > >
> > > Treat the incoming XSS change as an emulated write, i.e. validatate the
> > > guest-provided value, to avoid letting the guest load garbage into KVM's
> > > tracking. Simply ignore bad values, as either the guest managed to get an
> > > unsupported value into hardware, or the guest is misbehaving and providing
> > > pure garbage. In either case, KVM can't fix the broken guest.
> > >
> > > Note, emulating the change as an MSR write also takes care of side effects,
> > > e.g. marking dynamic CPUID bits as dirty.
> > >
> > > Suggested-by: John Allen <john.allen@....com>
> > > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> > > ---
> > > arch/x86/kvm/svm/sev.c | 3 +++
> > > arch/x86/kvm/svm/svm.h | 1 +
> > > 2 files changed, 4 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > if (kvm_ghcb_xcr0_is_valid(svm))
> > > __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > >
> > > + if (kvm_ghcb_xss_is_valid(svm))
> > > + __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > +
> >
> > It looks like this is the change that caused the selftest regression
> > with sev-es. It's not yet clear to me what the problem is though.
>
> Do you see any WARNs in the guest kernel log?
>
> The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> etc. But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> only used by init_xstate_size(), and I would expect the guest kernel's sanity
> checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
Yes, actually that looks to be the case:
[ 0.463504] ------------[ cut here ]------------
[ 0.464443] XSAVE consistency problem: size 880 != kernel_size 840
[ 0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
[ 0.466443] Modules linked in:
[ 0.467445] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.17.0-rc3-shstk-v15+ #6 PREEMPT(voluntary)
[ 0.468443] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[ 0.469444] RIP: 0010:paranoid_xstate_size_valid+0x101/0x140
[ 0.470443] Code: 89 44 24 04 e8 00 fa ff ff 8b 44 24 04 eb c2 89 da 89 c6 48 c7 c7 80 f4 bc 9e 89 44 24 04 c6 05 9d a3 a4 ff 01 e8 3f fa fb fd <0f> 0b 8b 44 24 04 eb ce 80 3d 8a a3 a4 ff 00 74 09 e8 c9 f9 ff ff
[ 0.471443] RSP: 0000:ffffffff9ee03e80 EFLAGS: 00010286
[ 0.472443] RAX: 0000000000000000 RBX: 0000000000000348 RCX: c0000000fffeffff
[ 0.473443] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd83c00
[ 0.474443] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000003
[ 0.475443] R10: ffffffff9ee03d20 R11: ffff8c04fff8ffe8 R12: 0000000000000001
[ 0.476443] R13: ffffffffffffffff R14: 0000000000000001 R15: 000000007c135000
[ 0.477443] FS: 0000000000000000(0000) GS:ffff8c051c118000(0000) knlGS:0000000000000000
[ 0.478443] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.479443] CR2: ffff8c03f4c01000 CR3: 0008000f73822001 CR4: 0000000000f70ef0
[ 0.480445] PKRU: 55555554
[ 0.480967] Call Trace:
[ 0.481446] <TASK>
[ 0.481856] init_xstate_size+0xa8/0x160
[ 0.482444] fpu__init_system_xstate+0x1c4/0x500
[ 0.483444] fpu__init_system+0x93/0xc0
[ 0.484443] arch_cpu_finalize_init+0xd2/0x160
[ 0.485290] start_kernel+0x330/0x470
[ 0.485444] x86_64_start_reservations+0x14/0x30
[ 0.486443] x86_64_start_kernel+0xd0/0xe0
[ 0.487443] common_startup_64+0x13e/0x141
[ 0.488444] </TASK>
[ 0.488879] ---[ end trace 0000000000000000 ]--
>
> Another possibility is that unconditionally setting cpuid_dynamic_bits_dirty
> was masking a pre-existing (or just different) bug, and that "fixing" that flaw
> by eliding cpuid_dynamic_bits_dirty when "vcpu->arch.ia32_xss == data" exposed
> the bug.
Powered by blists - more mailing lists