[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250416155327.GD17910@noisy.programming.kicks-ass.net>
Date: Wed, 16 Apr 2025 17:53:27 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
Cc: Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Eranian Stephane <eranian@...gle.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, Dapeng Mi <dapeng1.mi@...el.com>
Subject: Re: [Patch v3 16/22] perf/core: Support to capture higher width
vector registers
On Wed, Apr 16, 2025 at 02:42:12PM +0800, Mi, Dapeng wrote:
> Just think twice, using bitmap to represent these extended registers indeed
> wastes bits and is hard to extend, there could be much much more vector
> registers if considering AMX.
*Groan* so AMX should never have been register state :-(
> Considering different arch/HW may support different number vector register,
> like platform A supports 8 XMM registers and 8 YMM registers, but platform
> B only supports 16 XMM registers, a better way to represent these vector
> registers may add two fields, one is a bitmap which represents which kinds
> of vector registers needs to be captures. The other field could be a u16
> array which represents the corresponding register length of each kind of
> vector register. It may look like this.
>
> #define PERF_SAMPLE_EXT_REGS_XMM BIT(0)
> #define PERF_SAMPLE_EXT_REGS_YMM BIT(1)
> #define PERF_SAMPLE_EXT_REGS_ZMM BIT(2)
> __u32 sample_regs_intr_ext;
> __u16 sample_regs_intr_ext_len[4];
> __u32 sample_regs_user_ext;
> __u16 sample_regs_user_ext_len[4];
>
>
> Peter, how do you think this? Thanks.
I'm not entirely sure I understand.
How about something like:
__u16 sample_simd_reg_words;
__u64 sample_simd_reg_intr;
__u64 sample_simd_reg_user;
Then the simd_reg_words tell us how many (quad) words per register (8 for
512) and simd_reg_{intr,user} are a simple bitmap, one bit per actual
simd reg.
So then all of XMM would be:
words = 2;
intr = user = 0xFFFF;
(16 regs, 128 wide)
Whereas ZMM would be:
words = 8
intr = user = 0xFFFFFFFF;
(32 regs, 512 wide)
Would this be sufficient? Possibly we can split the words thing into two
__u8, but does it make sense to ask for different vector width for
intr and user ?
Powered by blists - more mailing lists