[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a1dda600-4452-4ed5-adaa-8a2c47753630@intel.com>
Date: Thu, 30 Oct 2025 08:00:46 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: chuang <nashuiliang@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
 open list <linux-kernel@...r.kernel.org>
Subject: Re: x86/fpu: Inaccurate AVX-512 Usage Tracking via arch_status
On 10/29/25 23:56, chuang wrote:
...
> I traced the code path within fpu_clone(): In fpu_clone() ->
> save_fpregs_to_fpstate(), since my current Intel CPU supports XSAVE,
> the call to os_xsave() results in the XFEATURE_Hi16_ZMM bit being
> set/enabled in xsave.header.xfeatures. This then causes
> update_avx_timestamp() to update fpu->avx512_timestamp. The same flow
> occurs in __switch_to() -> switch_fpu_prepare().
So that points more in the direction of the AVX-512 not getting
initialized. fpu_flush_thread() either isn't getting called or isn't
doing its job at execve(). *Or*, there's something subtle in your test
case that's causing AVX-512 to get tracked as non-init after execve().
> Given this, is the issue related to my specific Intel Xeon Gold? Is
> the CPU continuously indicating that the AVX-512 state is in use?
As much as I love to blame the hardware, I don't think we're quite there
yet. We've literally had software bugs in the past that had this exact
same behavior: AVX-512 state was tracked as non-init when it was never used.
Any chance you could figure out where you first see XFEATURE_Hi16_ZMM in
xfeatures? The tracepoints in here might help:
	/sys/kernel/debug/tracing/events/x86_fpu
Is there any rhyme or reason for which tasks see avx512_timestamp
getting set? Is it just your test program? Or other random tasks on the
system?
Powered by blists - more mailing lists
 
