[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e589b35-4ff8-43fa-99dd-d3b17f56d3ea@intel.com>
Date: Wed, 8 May 2024 17:29:05 -0700
From: "Chang S. Bae" <chang.seok.bae@...el.com>
To: Dave Hansen <dave.hansen@...el.com>, <linux-kernel@...r.kernel.org>
CC: <x86@...nel.org>, <platform-driver-x86@...r.kernel.org>,
<tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>, <hdegoede@...hat.com>,
<ilpo.jarvinen@...ux.intel.com>, <tony.luck@...el.com>,
<ashok.raj@...el.com>, <jithu.joseph@...el.com>
Subject: Re: [PATCH v2 1/2] x86/fpu: Extend kernel_fpu_begin_mask() to
initialize AMX state
On 5/8/2024 7:40 AM, Dave Hansen wrote:
>
> As I look closer, I'm not sure I think this is a good match for the two
> other KFPU_* flags. I don't think either KFPU_387 or KFPU_MXCSR causes
> any XFEATURE to be tracked as init. The SDM says that FNINIT "sets the
> FPU control, status, tag, instruction pointer, and data pointer
> registers to their default states."
>
> Off the top of my head, I don't know how that maps to the XSAVE init
> tracking but I do know that MXCSR has a very weird mapping to the first
> two XSAVE features.
FNINIT does *not* set the component to be tracked as init. Indeed, the
SSE component is somewhat convoluted. AMX state will be likely tracked
as init. But, I believe this init tracking mechanism is
implementation-specific, not architectural. Beyond this intricacy of
init-tracking, KFPU_* flags all initialize each state:
/* Kernel FPU states to initialize in kernel_fpu_begin_mask() */
#define KFPU_387 _BITUL(0) /* 387 state will be initialized */
#define KFPU_MXCSR _BITUL(1) /* MXCSR will be initialized */
> I really think it would be simplest to just have this whole thing do this:
>
> kernel_fpu_begin();
>
> // Zap AMX state
>
> kernel_fpu_end();
>
> Where the zap is either an os_xrstor() or an explicit tile release
> instruction.
If I understand correctly, the change could be something like this,
although I'm not sure about this new API naming and prototype at this point:
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index a2be3aefff9f..906634402ea6 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -28,6 +28,7 @@
extern void kernel_fpu_begin_mask(unsigned int kfpu_mask);
extern void kernel_fpu_end(void);
+extern void kernel_fpu_reset(void);
extern bool irq_fpu_usable(void);
extern void fpregs_mark_activate(void);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1209c7aebb21..0351f9ee3e49 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -452,6 +452,15 @@ void kernel_fpu_end(void)
}
EXPORT_SYMBOL_GPL(kernel_fpu_end);
+void kernel_fpu_reset(void)
+{
+ kernel_fpu_begin();
+ if (cpu_feature_enabled(X86_FEATURE_AMX_TILE))
+ tile_release();
+ kernel_fpu_end();
+}
+EXPORT_SYMBOL(kernel_fpu_reset);
+
/*
* Sync the FPU register state to current's memory register state when the
* current task owns the FPU. The hardware register state is preserved.
diff --git a/drivers/platform/x86/intel/ifs/ifs.h
b/drivers/platform/x86/intel/ifs/ifs.h
index 56b9f3e3cf76..5e7ba94b4054 100644
--- a/drivers/platform/x86/intel/ifs/ifs.h
+++ b/drivers/platform/x86/intel/ifs/ifs.h
@@ -129,6 +129,7 @@
*/
#include <linux/device.h>
#include <linux/miscdevice.h>
+#include <asm/fpu/api.h>
#define MSR_ARRAY_BIST 0x00000105
#define MSR_COPY_SCAN_HASHES 0x000002c2
diff --git a/drivers/platform/x86/intel/ifs/runtest.c
b/drivers/platform/x86/intel/ifs/runtest.c
index 95b4b71fab53..33ef4962aeba 100644
--- a/drivers/platform/x86/intel/ifs/runtest.c
+++ b/drivers/platform/x86/intel/ifs/runtest.c
@@ -188,6 +188,8 @@ static int doscan(void *data)
/* Only the first logical CPU on a core reports result */
first = cpumask_first(cpu_smt_mask(cpu));
+ kernel_fpu_reset();
+
wait_for_sibling_cpu(&scan_cpus_in, NSEC_PER_SEC);
/*
> It's just a matter of whether you want this to work like a regular old
> kernel FPU user or you want to tie it to *only* being able to run in its
> own kernel thread. -- Aside: I don't think I realized IFS had its own
> thread earlier.
While both approaches aim for the same functionality, the code change
seems to be smaller for the above. At the same time, it is notable that
these new APIs, fpu_idle_fpregs() and the other, are *only* for AMX.
1. The diff state of this series' approach:
arch/x86/include/asm/fpu/api.h | 1 +
arch/x86/kernel/fpu/core.c | 3 +++
drivers/platform/x86/intel/ifs/ifs.h | 15 +++++++++++++++
drivers/platform/x86/intel/ifs/runtest.c | 6 ++++++
4 files changed, 25 insertions(+)
2. The above code changes:
arch/x86/include/asm/fpu/api.h | 1 +
arch/x86/kernel/fpu/core.c | 9 +++++++++
drivers/platform/x86/intel/ifs/ifs.h | 1 +
drivers/platform/x86/intel/ifs/runtest.c | 2 ++
4 files changed, 13 insertions(+)
Thanks,
Chang
Powered by blists - more mailing lists