[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWkYVwTyOPxnRgzN@google.com>
Date: Thu, 15 Jan 2026 08:39:51 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Borislav Petkov <bp@...en8.de>,
"Kernel Mailing List, Linux" <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>,
"the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: [PATCH v2 0/4] x86, fpu/kvm: fix crash with AMX
On Thu, Jan 15, 2026, Paolo Bonzini wrote:
> Il gio 15 gen 2026, 13:22 Borislav Petkov <bp@...en8.de> ha scritto:
> >
> > On Thu, Jan 01, 2026 at 10:05:12AM +0100, Paolo Bonzini wrote:
> > > Fix a possible host panic, due to an unexpected #NM, when a KVM guest
> > > is using AMX features.
> > >
> > > The guest's XFD value, which is stored in fpstate->xfd, is used for both
> > > guest execution and host XSAVE operations.
> >
> > This already sounds weird. Why?
>
> Because the state of disabled components is undefined anyway. There's
> no point in making all host XSAVEs more expensive, even when the TMM
> registers aren't in use by the guest (which is going to be most of the
> time, likely).
>
> > Why don't we carry separate XFD copies - guest and host - which we use for the
> > guest and the host, respectively?
>
> That was exactly what I did in v1, but it's more code and less efficient too.
And creates a weird ABI for KVM:
: This also creates a nasty, subtle asymmetry in KVM's ABI. Notably, the comment
: above is wrong. XSAVE does NOT run with fpstate->xfd, it runs with whatever
: happens to be in hardware. For non-guest tasks, fpstate->xfd is guaranteed to
: be resident in hardware when save_fpregs_to_fpstate() runs, but for guest tasks,
: it will usually be the _guest's_ value. So in the common case, KVM_GET_XSAVE2
: would not return the same data set by KVM_SET_XSAVE.
:
: In theory we could ensure KVM saved exactly what is resident in hardware, but
: that's quite tricky (and costly!) as it would require doing xfd_update_state()
: before _every_ save_fpregs_to_fpstate(), e.g. not just in fpu_swap_kvm_fpstate().
: E.g. if the host kernel used the FPU from IRQ context (spoiler alert!), then KVM
: wouldn't have a chance to swap in the maximal XFD[18]=0 value (i.e. the userspace
: task's XFD).
And IMO papered over the true bug, which is that the xstate snapshot can become
inconsistent relative to KVM's tracking of guest XFD:
: Lastly, the fix is effectively papering over another bug, which I'm pretty sure
: is the underlying issue that was originally encountered. Assuming QEMU doesn't
: intercept MSR_IA32_XFD for its own purposes, the only sequence I've come up with
: that would result in KVM trying to load XTILE data with XFD[18]=1, without a
: colluding userspace VMM (Paolo's selftest) is:
:
: 1. vCPU loads non-init XTILE data without ever setting XFD to a non-zero value
: (KVM only disables XFD interception on writes with a non-zero value).
: 2. Guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1
: 3. VM-Exit due to the WRMSR
: 4. Host IRQ arrives and triggers kernel_fpu_begin()
: 5. save_fpregs_to_fpstate() saves guest FPU with XFD[18]=0
: 6. fpu_update_guest_xfd() stuffs guest_fpu->fpstate->xfd = XFD[18]=1
: 7. vcpu_enter_guest() attempts to load XTILE data with XFD[18]=1
:
: Note! There's no KVM_SET_XSAVE2 in the above, i.e. this doesn't require userspace
: to trigger save/restore for live migration or whatever, the only timing condition
: is the arrival of an IRQ that uses kernel FPU during the XFD 0=>1 VM-Exit.
https://lore.kernel.org/all/aVMEcaZD_SzKzRvr@google.com
Powered by blists - more mailing lists