[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250617082423.GK1613376@noisy.programming.kicks-ass.net>
Date: Tue, 17 Jun 2025 10:24:23 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: kan.liang@...ux.intel.com
Cc: mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
tglx@...utronix.de, dave.hansen@...ux.intel.com, irogers@...gle.com,
adrian.hunter@...el.com, jolsa@...nel.org,
alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
dapeng1.mi@...ux.intel.com, ak@...ux.intel.com, zide.chen@...el.com
Subject: Re: [RFC PATCH 00/12] Support vector and more extended registers in
perf
On Fri, Jun 13, 2025 at 06:49:31AM -0700, kan.liang@...ux.intel.com wrote:
> From: Kan Liang <kan.liang@...ux.intel.com>
>
> Starting from the Intel Ice Lake, the XMM registers can be collected in
> a PEBS record. More registers, e.g., YMM, ZMM, OPMASK, SPP and APX, will
> be added in the upcoming Architecture PEBS as well. But it requires the
> hardware support.
>
> The patch set provides a software solution to mitigate the hardware
> requirement. It utilizes the XSAVES command to retrieve the requested
> registers in the overflow handler. The feature isn't limited to the PEBS
> event or specific platforms anymore.
> The hardware solution (if available) is still preferred, since it has
> low overhead (especially with the large PEBS) and is more accurate.
>
> In theory, the solution should work for all X86 platforms. But I only
> have newer Inter platforms to test. The patch set only enable the
> feature for Intel Ice Lake and later platforms.
>
> Open:
> The new registers include YMM, ZMM, OPMASK, SSP, and APX.
> The sample_regs_user/intr has run out. A new field in the
> struct perf_event_attr is required for the registers.
> There could be several options as below for the new field.
>
> - Follow a similar format to XSAVES. Introduce the below fields to store
> the bitmap of the registers.
> struct perf_event_attr {
> ...
> __u64 sample_ext_regs_intr[2];
> __u64 sample_ext_regs_user[2];
> ...
> }
> Includes YMMH (16 bits), APX (16 bits), OPMASK (8 bits),
> ZMMH0-15 (16 bits), H16ZMM (16 bits), SSP
> For example, if a user wants YMM8, the perf tool needs to set the
> corresponding bits of XMM8 and YMMH8, and reconstruct the result.
> The method is similar to the existing method for
> sample_regs_user/intr, and match the XSAVES format.
> The kernel doesn't need to do extra configuration and reconstruction.
> It's implemented in the patch set.
>
> - Similar to the above method. But the fields are the bitmap of the
> complete registers, E.g., YMM (16 bits), APX (16 bits),
> OPMASK (8 bits), ZMM (32 bits), SSP.
> The kernel needs to do extra configuration and reconstruction,
> which may brings extra overhead.
>
> - Combine the XMM, YMM, and ZMM. So all the registers can be put into
> one u64 field.
> ...
> union {
> __u64 sample_ext_regs_intr; //sample_ext_regs_user is simiar
> struct {
> __u32 vector_bitmap;
> __u32 vector_type : 3, //0b001 XMM 0b010 YMM 0b100 ZMM
> apx_bitmap : 16,
> opmask_bitmap : 8,
> ssp_bitmap : 1,
> reserved : 4,
>
> };
> ...
> For example, if the YMM8-15 is required,
> vector_bitmap: 0x0000ff00
> vector_type: 0x2
> This method can save two __u64 in the struct perf_event_attr.
> But it's not straightforward since it mixes the type and bitmap.
> The kernel also needs to do extra configuration and reconstruction.
>
> Please let me know if there are more ideas.
https://lkml.kernel.org/r/20250416155327.GD17910@noisy.programming.kicks-ass.net
comes to mind. Combine that with a rule that reclaims the XMM register
space from perf_event_x86_regs when sample_simd_reg_words != 0, and then
we can put APX and SPP there.
Powered by blists - more mailing lists