lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250613134943.3186517-1-kan.liang@linux.intel.com>
Date: Fri, 13 Jun 2025 06:49:31 -0700
From: kan.liang@...ux.intel.com
To: peterz@...radead.org,
	mingo@...hat.com,
	acme@...nel.org,
	namhyung@...nel.org,
	tglx@...utronix.de,
	dave.hansen@...ux.intel.com,
	irogers@...gle.com,
	adrian.hunter@...el.com,
	jolsa@...nel.org,
	alexander.shishkin@...ux.intel.com,
	linux-kernel@...r.kernel.org
Cc: dapeng1.mi@...ux.intel.com,
	ak@...ux.intel.com,
	zide.chen@...el.com,
	Kan Liang <kan.liang@...ux.intel.com>
Subject: [RFC PATCH 00/12] Support vector and more extended registers in perf

From: Kan Liang <kan.liang@...ux.intel.com>

Starting from the Intel Ice Lake, the XMM registers can be collected in
a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
be added in the upcoming Architecture PEBS as well. But it requires the
hardware support.

The patch set provides a software solution to mitigate the hardware
requirement. It utilizes the XSAVES command to retrieve the requested
registers in the overflow handler. The feature isn't limited to the PEBS
event or specific platforms anymore.
The hardware solution (if available) is still preferred, since it has
low overhead (especially with the large PEBS) and is more accurate.

In theory, the solution should work for all X86 platforms. But I only
have newer Inter platforms to test. The patch set only enable the
feature for Intel Ice Lake and later platforms.

Open:
The new registers include YMM, ZMM, OPMASK, SSP, and APX.
The sample_regs_user/intr has run out. A new field in the
struct perf_event_attr is required for the registers.
There could be several options as below for the new field.

- Follow a similar format to XSAVES. Introduce the below fields to store
  the bitmap of the registers.
  struct perf_event_attr {
        ...
        __u64   sample_ext_regs_intr[2];
        __u64   sample_ext_regs_user[2];
        ...
  }
  Includes YMMH (16 bits), APX (16 bits), OPMASK (8 bits),
           ZMMH0-15 (16 bits), H16ZMM (16 bits), SSP
  For example, if a user wants YMM8, the perf tool needs to set the
  corresponding bits of XMM8 and YMMH8, and reconstruct the result.
  The method is similar to the existing method for
  sample_regs_user/intr, and match the XSAVES format.
  The kernel doesn't need to do extra configuration and reconstruction.
  It's implemented in the patch set.

- Similar to the above method. But the fields are the bitmap of the
  complete registers, E.g., YMM (16 bits), APX (16 bits),
  OPMASK (8 bits), ZMM (32 bits), SSP.
  The kernel needs to do extra configuration and reconstruction,
  which may brings extra overhead.

- Combine the XMM, YMM, and ZMM. So all the registers can be put into
  one u64 field.
        ...
        union {
                __u64 sample_ext_regs_intr;   //sample_ext_regs_user is simiar
                struct {
                        __u32 vector_bitmap;
                        __u32 vector_type   : 3, //0b001 XMM 0b010 YMM 0b100 ZMM
                              apx_bitmap    : 16,
                              opmask_bitmap : 8,
                              ssp_bitmap    : 1,
                              reserved      : 4,

                };
        ...
  For example, if the YMM8-15 is required,
  vector_bitmap: 0x0000ff00
  vector_type: 0x2
  This method can save two __u64 in the struct perf_event_attr.
  But it's not straightforward since it mixes the type and bitmap.
  The kernel also needs to do extra configuration and reconstruction.

Please let me know if there are more ideas.

Thanks,
Kan



Kan Liang (12):
  perf/x86: Use x86_perf_regs in the x86 nmi handler
  perf/x86: Setup the regs data
  x86/fpu/xstate: Add xsaves_nmi
  perf: Move has_extended_regs() to header file
  perf/x86: Support XMM register for non-PEBS and REGS_USER
  perf: Support extension of sample_regs
  perf/x86: Add YMMH in extended regs
  perf/x86: Add APX in extended regs
  perf/x86: Add OPMASK in extended regs
  perf/x86: Add ZMM in extended regs
  perf/x86: Add SSP in extended regs
  perf/x86/intel: Support extended registers

 arch/arm/kernel/perf_regs.c           |   9 +-
 arch/arm64/kernel/perf_regs.c         |   9 +-
 arch/csky/kernel/perf_regs.c          |   9 +-
 arch/loongarch/kernel/perf_regs.c     |   8 +-
 arch/mips/kernel/perf_regs.c          |   9 +-
 arch/powerpc/perf/perf_regs.c         |   9 +-
 arch/riscv/kernel/perf_regs.c         |   8 +-
 arch/s390/kernel/perf_regs.c          |   9 +-
 arch/x86/events/core.c                | 226 ++++++++++++++++++++++++--
 arch/x86/events/intel/core.c          |  49 ++++++
 arch/x86/events/intel/ds.c            |  12 +-
 arch/x86/events/perf_event.h          |  58 +++++++
 arch/x86/include/asm/fpu/xstate.h     |   1 +
 arch/x86/include/asm/perf_event.h     |   6 +
 arch/x86/include/uapi/asm/perf_regs.h | 101 ++++++++++++
 arch/x86/kernel/fpu/xstate.c          |  22 +++
 arch/x86/kernel/perf_regs.c           |  85 +++++++++-
 include/linux/perf_event.h            |  23 +++
 include/linux/perf_regs.h             |  29 +++-
 include/uapi/linux/perf_event.h       |   8 +
 kernel/events/core.c                  |  63 +++++--
 21 files changed, 699 insertions(+), 54 deletions(-)

-- 
2.38.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ