linux-kernel - Re: [PATCH 2/5] x86, fpu: separate fpstate->xfd and guest XFD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABgObfa5ViBjb_BnmKqf0+7M6rZ5-M+yOw_7tVK_Ek6tp21Z=w@mail.gmail.com>
Date: Tue, 30 Dec 2025 00:31:55 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org, 
	stable@...r.kernel.org
Subject: Re: [PATCH 2/5] x86, fpu: separate fpstate->xfd and guest XFD

On Mon, Dec 29, 2025 at 11:45 PM Sean Christopherson <seanjc@...gle.com> wrote:
> The fix works only because the userspace XFD[18] must be '0' since the kernel
> never re-disables XFD features after they are enabled.

Yes, this is why I considered XFD[18]=1 to be a bug.

> Which is probably fine
> in practice since re-disabling a component for a guest task would need to force
> the guest FPU back into an init state as well, but I don't love the complexity.
>
> This also creates a nasty, subtle asymmetry in KVM's ABI.

I find this second argument more convincing; I preferred the
complexity of an extra field to track guest xfd, to having to patch
xstate_bv.

(Initially I was worried also about mismatches between xstate_bv and
xcomp_bv but those are find; When the compacted format is in use
xcomp_bv[62:0] is simply EDX:EAX & XCR0).

> Lastly, the fix is effectively papering over another bug, which I'm pretty sure
> is the underlying issue that was originally encountered.

Yes, I agree this is the most likely scenario. Whether it's papering
over it or otherwise, it depends on what you consider the invariants
to be.

> So, given that KVM's effective ABI is to record XSTATE_BV[i]=0 if XFD[i]==1, I
> vote to fix this by emulating that behavior when stuffing XFD in
> fpu_update_guest_xfd(), and then manually closing the hole Paolo found in
> fpu_copy_uabi_to_guest_fpstate().

I disagree with changing the argument from const void* to void*.
Let's instead treat it as a KVM backwards-compatibility quirk:

    union fpregs_state *xstate =
        (union fpregs_state *)guest_xsave->region;
    xstate->xsave.header.xfeatures &=
        ~vcpu->arch.guest_fpu.fpstate->xfd;

It keeps the kernel/ API const as expected and if anything I'd
consider adding a WARN to fpu_copy_uabi_to_guest_fpstate(), basically
asserting that there would be no #NM on the subsequent restore.

> @@ -319,10 +319,25 @@ EXPORT_SYMBOL_FOR_KVM(fpu_enable_guest_xfd_features);
>  #ifdef CONFIG_X86_64
>  void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd)
>  {
> +       struct fpstate *fpstate = guest_fpu->fpstate;
> +
>         fpregs_lock();
> -       guest_fpu->fpstate->xfd = xfd;
> -       if (guest_fpu->fpstate->in_use)
> -               xfd_update_state(guest_fpu->fpstate);
> +       fpstate->xfd = xfd;
> +       if (fpstate->in_use)
> +               xfd_update_state(fpstate);
> +
> +       /*
> +        * If the guest's FPU state is NOT resident in hardware, clear disabled
> +        * components in XSTATE_BV as attempting to load disabled components
> +        * will generate #NM _in the host_, and KVM's ABI is that saving guest
> +        * XSAVE state should see XSTATE_BV[i]=0 if XFD[i]=1.
> +        *
> +        * If the guest's FPU state is in hardware, simply do nothing as XSAVE
> +        * itself saves XSTATE_BV[i] as 0 if XFD[i]=1.

s/saves/(from fpu_swap_kvm_fpstate) will save/

> +        */
> +       if (xfd && test_thread_flag(TIF_NEED_FPU_LOAD))
> +               fpstate->regs.xsave.header.xfeatures &= ~xfd;

No objections to this part.  I'll play with this to adjust the
selftests either tomorrow or, more likely, on January 2nd, and send a
v2 that also includes the change from preemption_disabled to
irqs_disabled.

I take it that you don't have any qualms with the new
fpu_load_guest_fpstate function, but let me know if you prefer to have
it in a separate submission destined to 6.20 only.

Paolo