lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aD6Ukwqz2Q5RKpEm@intel.com>
Date: Tue, 3 Jun 2025 14:22:11 +0800
From: Chao Gao <chao.gao@...el.com>
To: "Chang S. Bae" <chang.seok.bae@...el.com>
CC: Dave Hansen <dave.hansen@...el.com>, Sean Christopherson
	<seanjc@...gle.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<kvm@...r.kernel.org>, <tglx@...utronix.de>, <pbonzini@...hat.com>,
	<peterz@...radead.org>, <rick.p.edgecombe@...el.com>,
	<weijiang.yang@...el.com>, <john.allen@....com>, <bp@...en8.de>,
	<xin3.li@...el.com>, Dave Hansen <dave.hansen@...ux.intel.com>, Eric Biggers
	<ebiggers@...gle.com>, "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar
	<mingo@...hat.com>, Kees Cook <kees@...nel.org>, Maxim Levitsky
	<mlevitsk@...hat.com>, Mitchell Levy <levymitchell0@...il.com>, "Nikolay
 Borisov" <nik.borisov@...e.com>, Oleg Nesterov <oleg@...hat.com>, Sohil Mehta
	<sohil.mehta@...el.com>, Stanislav Spassov <stanspas@...zon.de>, "Vignesh
 Balasubramanian" <vigbalas@....com>
Subject: Re: [PATCH v8 0/6] Introduce CET supervisor state support

On Mon, Jun 02, 2025 at 12:12:51PM -0700, Chang S. Bae wrote:
>== Preempt Case ==
>
>To illustrate how the XFD MSR state becomes incorrect in this scenario:
>
> task #1 (fpstate->xfd=0)  task #2 (fpstate->xfd=0x80000)
> ========================  ==============================
>                           handle_signal()
>                           -> setup_rt_frame()
>                              -> get_siframe()
>                                 -> copy_fpstate_to_sigframe()
>                                    -> fpregs_unlock()
>                                 ...
>  ...
>  switch_fpu_return()
>  -> fpregs_restore_userregs()
>     -> restore_fpregs_from_fpstate()
>        -> xfd_write_state()
>           ^ IA32_XFD_MSR = 0
>  ...
>                                 ...
>                              -> fpu__clear_user_states()
>                                 -> fpregs_lock()
>                                 -> restore_fpregs_from_init_fpstate()
>                                    -> os_rstor()
>                                       -> xfd_validate_state()
>                                          ^ IA32_XFD_MSR != fpstate->xfd
>                                 -> fpregs_mark_active()
>                                 -> fpregs_unlock()
>
>Since fpu__clear_user_states() marks the FPU state as valid in the end, an
>XFD MSR sync-up was clearly missing.

Thanks for this analysis. It makes sense.

>
>== Return-to-Userspace Path ==
>
>Both tasks at that moment are on the return-to-userspace path, but at
>different points in IRQ state:
>
>  * task #2 is inside handle_signal() and already re-enabled IRQs.
>  * task #1 is after IRQ is disabled again when calling
>    switch_fpu_return().
>
>  local_irq_disable_exit_to_user()
>  exit_to_user_mode_prepare()
>  -> exit_to_user_mode_loop()
>     -> local_irq_enable_exit_to_user()
>        -> arch_do_signal_or_restart()
>           -> handle_signal()
>     -> local_irq_disable_exit_to_user()
>  -> arch_exit_user_mode_prepare()
>     -> arch_exit_work()
>        -> switch_fpu_return()
>
>This implies that fpregs_lock()/fpregs_unlock() is necessary inside
>handle_signal() when XSAVE instructions are invoked.
>
>But, it should be okay for switch_fpu_return() to call
>fpregs_restore_userregs() without fpregs_lock().
>
>== XFD Sanity Checker ==
>
>The XFD sanity checker -- xfd_op_valid() -- correctly caught this issue in
>the test case. However, it may have a false negative when AMX usage was
>flipped between the two tasks.
>
>Despite that, I don't think extending its coverage is worthwhile, as it would
>complicate the logic. The current logic and documentation seem sound.
>
>== Fix Consideration ==
>
>I think the fix is straightforward: resynchronize the IA32_XFD MSR in
>fpu__clear_user_states().

This fix sounds good.

Btw, what do you think the impact of this issue might be?

Aside from the splat, task #2 could execute AMX instructions without
requesting permissions, but its AMX state would be discarded during the
next FPU switch, as RFBM[18] is cleared when executing XSAVES. And, in the
"flipped" scenario you mentioned, task #2 might receive an extra #NM, after
which its fpstate would be re-allocated (although the size won't increase
further).

So, for well-behaved tasks that never use AMX, there is no impact; tasks
that use AMX may receive extra #NM. There won't be any unexpected #PF,
memory corruption, or kernel panic.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ