[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250613134943.3186517-1-kan.liang@linux.intel.com>
Date: Fri, 13 Jun 2025 06:49:31 -0700
From: kan.liang@...ux.intel.com
To: peterz@...radead.org,
mingo@...hat.com,
acme@...nel.org,
namhyung@...nel.org,
tglx@...utronix.de,
dave.hansen@...ux.intel.com,
irogers@...gle.com,
adrian.hunter@...el.com,
jolsa@...nel.org,
alexander.shishkin@...ux.intel.com,
linux-kernel@...r.kernel.org
Cc: dapeng1.mi@...ux.intel.com,
ak@...ux.intel.com,
zide.chen@...el.com,
Kan Liang <kan.liang@...ux.intel.com>
Subject: [RFC PATCH 00/12] Support vector and more extended registers in perf
From: Kan Liang <kan.liang@...ux.intel.com>
Starting from the Intel Ice Lake, the XMM registers can be collected in
a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
be added in the upcoming Architecture PEBS as well. But it requires the
hardware support.
The patch set provides a software solution to mitigate the hardware
requirement. It utilizes the XSAVES command to retrieve the requested
registers in the overflow handler. The feature isn't limited to the PEBS
event or specific platforms anymore.
The hardware solution (if available) is still preferred, since it has
low overhead (especially with the large PEBS) and is more accurate.
In theory, the solution should work for all X86 platforms. But I only
have newer Inter platforms to test. The patch set only enable the
feature for Intel Ice Lake and later platforms.
Open:
The new registers include YMM, ZMM, OPMASK, SSP, and APX.
The sample_regs_user/intr has run out. A new field in the
struct perf_event_attr is required for the registers.
There could be several options as below for the new field.
- Follow a similar format to XSAVES. Introduce the below fields to store
the bitmap of the registers.
struct perf_event_attr {
...
__u64 sample_ext_regs_intr[2];
__u64 sample_ext_regs_user[2];
...
}
Includes YMMH (16 bits), APX (16 bits), OPMASK (8 bits),
ZMMH0-15 (16 bits), H16ZMM (16 bits), SSP
For example, if a user wants YMM8, the perf tool needs to set the
corresponding bits of XMM8 and YMMH8, and reconstruct the result.
The method is similar to the existing method for
sample_regs_user/intr, and match the XSAVES format.
The kernel doesn't need to do extra configuration and reconstruction.
It's implemented in the patch set.
- Similar to the above method. But the fields are the bitmap of the
complete registers, E.g., YMM (16 bits), APX (16 bits),
OPMASK (8 bits), ZMM (32 bits), SSP.
The kernel needs to do extra configuration and reconstruction,
which may brings extra overhead.
- Combine the XMM, YMM, and ZMM. So all the registers can be put into
one u64 field.
...
union {
__u64 sample_ext_regs_intr; //sample_ext_regs_user is simiar
struct {
__u32 vector_bitmap;
__u32 vector_type : 3, //0b001 XMM 0b010 YMM 0b100 ZMM
apx_bitmap : 16,
opmask_bitmap : 8,
ssp_bitmap : 1,
reserved : 4,
};
...
For example, if the YMM8-15 is required,
vector_bitmap: 0x0000ff00
vector_type: 0x2
This method can save two __u64 in the struct perf_event_attr.
But it's not straightforward since it mixes the type and bitmap.
The kernel also needs to do extra configuration and reconstruction.
Please let me know if there are more ideas.
Thanks,
Kan
Kan Liang (12):
perf/x86: Use x86_perf_regs in the x86 nmi handler
perf/x86: Setup the regs data
x86/fpu/xstate: Add xsaves_nmi
perf: Move has_extended_regs() to header file
perf/x86: Support XMM register for non-PEBS and REGS_USER
perf: Support extension of sample_regs
perf/x86: Add YMMH in extended regs
perf/x86: Add APX in extended regs
perf/x86: Add OPMASK in extended regs
perf/x86: Add ZMM in extended regs
perf/x86: Add SSP in extended regs
perf/x86/intel: Support extended registers
arch/arm/kernel/perf_regs.c | 9 +-
arch/arm64/kernel/perf_regs.c | 9 +-
arch/csky/kernel/perf_regs.c | 9 +-
arch/loongarch/kernel/perf_regs.c | 8 +-
arch/mips/kernel/perf_regs.c | 9 +-
arch/powerpc/perf/perf_regs.c | 9 +-
arch/riscv/kernel/perf_regs.c | 8 +-
arch/s390/kernel/perf_regs.c | 9 +-
arch/x86/events/core.c | 226 ++++++++++++++++++++++++--
arch/x86/events/intel/core.c | 49 ++++++
arch/x86/events/intel/ds.c | 12 +-
arch/x86/events/perf_event.h | 58 +++++++
arch/x86/include/asm/fpu/xstate.h | 1 +
arch/x86/include/asm/perf_event.h | 6 +
arch/x86/include/uapi/asm/perf_regs.h | 101 ++++++++++++
arch/x86/kernel/fpu/xstate.c | 22 +++
arch/x86/kernel/perf_regs.c | 85 +++++++++-
include/linux/perf_event.h | 23 +++
include/linux/perf_regs.h | 29 +++-
include/uapi/linux/perf_event.h | 8 +
kernel/events/core.c | 63 +++++--
21 files changed, 699 insertions(+), 54 deletions(-)
--
2.38.1
Powered by blists - more mailing lists