[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWdHyLM=-cWucJNHJMuSHDP0vt8y1_B7cy3D=MAhuE_6Q@mail.gmail.com>
Date: Wed, 21 Jan 2026 23:27:57 -0800
From: Ian Rogers <irogers@...gle.com>
To: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>, Jiri Olsa <jolsa@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Andi Kleen <ak@...ux.intel.com>,
Eranian Stephane <eranian@...gle.com>, Mark Rutland <mark.rutland@....com>, broonie@...nel.org,
Ravi Bangoria <ravi.bangoria@....com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, Zide Chen <zide.chen@...el.com>,
Falcon Thomas <thomas.falcon@...el.com>, Dapeng Mi <dapeng1.mi@...el.com>,
Xudong Hao <xudong.hao@...el.com>, Kan Liang <kan.liang@...ux.intel.com>
Subject: Re: [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format
On Wed, Jan 21, 2026 at 5:49 PM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
>
>
> On 1/21/2026 10:48 PM, Ian Rogers wrote:
> > On Tue, Jan 20, 2026 at 11:52 PM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
> >>
> >> On 1/21/2026 3:09 PM, Ian Rogers wrote:
> >>> On Tue, Jan 20, 2026 at 9:17 PM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
> >>>> On 1/21/2026 2:20 AM, Ian Rogers wrote:
> >>>>> On Tue, Jan 20, 2026 at 1:04 AM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
> >>>>>> On 1/20/2026 3:39 PM, Ian Rogers wrote:
> >>>>>>> On Tue, Dec 2, 2025 at 10:59 PM Dapeng Mi <dapeng1.mi@...ux.intel.com> wrote:
> >>>>>>>> From: Kan Liang <kan.liang@...ux.intel.com>
> >>>>>>>>
> >>>>>>>> This patch adds support for the newly introduced SIMD register sampling
> >>>>>>>> format by adding the following functions:
> >>>>>>>>
> >>>>>>>> uint64_t arch__intr_simd_reg_mask(void);
> >>>>>>>> uint64_t arch__user_simd_reg_mask(void);
> >>>>>>>> uint64_t arch__intr_pred_reg_mask(void);
> >>>>>>>> uint64_t arch__user_pred_reg_mask(void);
> >>>>>>>> uint64_t arch__intr_simd_reg_bitmap_qwords(int reg, u16 *qwords);
> >>>>>>>> uint64_t arch__user_simd_reg_bitmap_qwords(int reg, u16 *qwords);
> >>>>>>>> uint64_t arch__intr_pred_reg_bitmap_qwords(int reg, u16 *qwords);
> >>>>>>>> uint64_t arch__user_pred_reg_bitmap_qwords(int reg, u16 *qwords);
> >>>>>>>>
> >>>>>>>> The arch__{intr|user}_simd_reg_mask() functions retrieve the bitmap of
> >>>>>>>> supported SIMD registers, such as XMM/YMM/ZMM on x86 platforms.
> >>>>>>>>
> >>>>>>>> The arch__{intr|user}_pred_reg_mask() functions retrieve the bitmap of
> >>>>>>>> supported PRED registers, such as OPMASK on x86 platforms.
> >>>>>>>>
> >>>>>>>> The arch__{intr|user}_simd_reg_bitmap_qwords() functions provide the
> >>>>>>>> exact bitmap and number of qwords for a specific type of SIMD register.
> >>>>>>>> For example, for XMM registers on x86 platforms, the returned bitmap is
> >>>>>>>> 0xffff (XMM0 ~ XMM15) and the qwords number is 2 (128 bits for each XMM).
> >>>>>>>>
> >>>>>>>> The arch__{intr|user}_pred_reg_bitmap_qwords() functions provide the
> >>>>>>>> exact bitmap and number of qwords for a specific type of PRED register.
> >>>>>>>> For example, for OPMASK registers on x86 platforms, the returned bitmap
> >>>>>>>> is 0xff (OPMASK0 ~ OPMASK7) and the qwords number is 1 (64 bits for each
> >>>>>>>> OPMASK).
> >>>>>>>>
> >>>>>>>> Additionally, the function __parse_regs() is enhanced to support parsing
> >>>>>>>> these newly introduced SIMD registers. Currently, each type of register
> >>>>>>>> can only be sampled collectively; sampling a specific SIMD register is
> >>>>>>>> not supported. For example, all XMM registers are sampled together rather
> >>>>>>>> than sampling only XMM0.
> >>>>>>>>
> >>>>>>>> When multiple overlapping register types, such as XMM and YMM, are
> >>>>>>>> sampled simultaneously, only the superset (YMM registers) is sampled.
> >>>>>>>>
> >>>>>>>> With this patch, all supported sampling registers on x86 platforms are
> >>>>>>>> displayed as follows.
> >>>>>>>>
> >>>>>>>> $perf record -I?
> >>>>>>>> available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
> >>>>>>>> R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
> >>>>>>>> R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7
> >>>>>>>>
> >>>>>>>> $perf record --user-regs=?
> >>>>>>>> available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10
> >>>>>>>> R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28
> >>>>>>>> R29 R30 R31 SSP XMM0-15 YMM0-15 ZMM0-31 OPMASK0-7
> >>>>>>>>
> >>>>>>>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> >>>>>>>> Co-developed-by: Dapeng Mi <dapeng1.mi@...ux.intel.com>
> >>>>>>>> Signed-off-by: Dapeng Mi <dapeng1.mi@...ux.intel.com>
> >>>>>>>> ---
> >>>>>>>> tools/perf/arch/x86/util/perf_regs.c | 470 +++++++++++++++++++++-
> >>>>>>>> tools/perf/util/evsel.c | 27 ++
> >>>>>>>> tools/perf/util/parse-regs-options.c | 151 ++++++-
> >>>>>>>> tools/perf/util/perf_event_attr_fprintf.c | 6 +
> >>>>>>>> tools/perf/util/perf_regs.c | 59 +++
> >>>>>>>> tools/perf/util/perf_regs.h | 11 +
> >>>>>>>> tools/perf/util/record.h | 6 +
> >>>>>>>> 7 files changed, 714 insertions(+), 16 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/util/perf_regs.c
> >>>>>>>> index 12fd93f04802..db41430f3b07 100644
> >>>>>>>> --- a/tools/perf/arch/x86/util/perf_regs.c
> >>>>>>>> +++ b/tools/perf/arch/x86/util/perf_regs.c
> >>>>>>>> @@ -13,6 +13,49 @@
> >>>>>>>> #include "../../../util/pmu.h"
> >>>>>>>> #include "../../../util/pmus.h"
> >>>>>>>>
> >>>>>>>> +static const struct sample_reg sample_reg_masks_ext[] = {
> >>>>>>>> + SMPL_REG(AX, PERF_REG_X86_AX),
> >>>>>>>> + SMPL_REG(BX, PERF_REG_X86_BX),
> >>>>>>>> + SMPL_REG(CX, PERF_REG_X86_CX),
> >>>>>>>> + SMPL_REG(DX, PERF_REG_X86_DX),
> >>>>>>>> + SMPL_REG(SI, PERF_REG_X86_SI),
> >>>>>>>> + SMPL_REG(DI, PERF_REG_X86_DI),
> >>>>>>>> + SMPL_REG(BP, PERF_REG_X86_BP),
> >>>>>>>> + SMPL_REG(SP, PERF_REG_X86_SP),
> >>>>>>>> + SMPL_REG(IP, PERF_REG_X86_IP),
> >>>>>>>> + SMPL_REG(FLAGS, PERF_REG_X86_FLAGS),
> >>>>>>>> + SMPL_REG(CS, PERF_REG_X86_CS),
> >>>>>>>> + SMPL_REG(SS, PERF_REG_X86_SS),
> >>>>>>>> +#ifdef HAVE_ARCH_X86_64_SUPPORT
> >>>>>>>> + SMPL_REG(R8, PERF_REG_X86_R8),
> >>>>>>>> + SMPL_REG(R9, PERF_REG_X86_R9),
> >>>>>>>> + SMPL_REG(R10, PERF_REG_X86_R10),
> >>>>>>>> + SMPL_REG(R11, PERF_REG_X86_R11),
> >>>>>>>> + SMPL_REG(R12, PERF_REG_X86_R12),
> >>>>>>>> + SMPL_REG(R13, PERF_REG_X86_R13),
> >>>>>>>> + SMPL_REG(R14, PERF_REG_X86_R14),
> >>>>>>>> + SMPL_REG(R15, PERF_REG_X86_R15),
> >>>>>>>> + SMPL_REG(R16, PERF_REG_X86_R16),
> >>>>>>>> + SMPL_REG(R17, PERF_REG_X86_R17),
> >>>>>>>> + SMPL_REG(R18, PERF_REG_X86_R18),
> >>>>>>>> + SMPL_REG(R19, PERF_REG_X86_R19),
> >>>>>>>> + SMPL_REG(R20, PERF_REG_X86_R20),
> >>>>>>>> + SMPL_REG(R21, PERF_REG_X86_R21),
> >>>>>>>> + SMPL_REG(R22, PERF_REG_X86_R22),
> >>>>>>>> + SMPL_REG(R23, PERF_REG_X86_R23),
> >>>>>>>> + SMPL_REG(R24, PERF_REG_X86_R24),
> >>>>>>>> + SMPL_REG(R25, PERF_REG_X86_R25),
> >>>>>>>> + SMPL_REG(R26, PERF_REG_X86_R26),
> >>>>>>>> + SMPL_REG(R27, PERF_REG_X86_R27),
> >>>>>>>> + SMPL_REG(R28, PERF_REG_X86_R28),
> >>>>>>>> + SMPL_REG(R29, PERF_REG_X86_R29),
> >>>>>>>> + SMPL_REG(R30, PERF_REG_X86_R30),
> >>>>>>>> + SMPL_REG(R31, PERF_REG_X86_R31),
> >>>>>>>> + SMPL_REG(SSP, PERF_REG_X86_SSP),
> >>>>>>>> +#endif
> >>>>>>>> + SMPL_REG_END
> >>>>>>>> +};
> >>>>>>>> +
> >>>>>>>> static const struct sample_reg sample_reg_masks[] = {
> >>>>>>>> SMPL_REG(AX, PERF_REG_X86_AX),
> >>>>>>>> SMPL_REG(BX, PERF_REG_X86_BX),
> >>>>>>>> @@ -276,27 +319,404 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
> >>>>>>>> return SDT_ARG_VALID;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> +static bool support_simd_reg(u64 sample_type, u16 qwords, u64 mask, bool pred)
> >>>>>>>> +{
> >>>>>>>> + struct perf_event_attr attr = {
> >>>>>>>> + .type = PERF_TYPE_HARDWARE,
> >>>>>>>> + .config = PERF_COUNT_HW_CPU_CYCLES,
> >>>>>>>> + .sample_type = sample_type,
> >>>>>>>> + .disabled = 1,
> >>>>>>>> + .exclude_kernel = 1,
> >>>>>>>> + .sample_simd_regs_enabled = 1,
> >>>>>>>> + };
> >>>>>>>> + int fd;
> >>>>>>>> +
> >>>>>>>> + attr.sample_period = 1;
> >>>>>>>> +
> >>>>>>>> + if (!pred) {
> >>>>>>>> + attr.sample_simd_vec_reg_qwords = qwords;
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR)
> >>>>>>>> + attr.sample_simd_vec_reg_intr = mask;
> >>>>>>>> + else
> >>>>>>>> + attr.sample_simd_vec_reg_user = mask;
> >>>>>>>> + } else {
> >>>>>>>> + attr.sample_simd_pred_reg_qwords = PERF_X86_OPMASK_QWORDS;
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR)
> >>>>>>>> + attr.sample_simd_pred_reg_intr = PERF_X86_SIMD_PRED_MASK;
> >>>>>>>> + else
> >>>>>>>> + attr.sample_simd_pred_reg_user = PERF_X86_SIMD_PRED_MASK;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + if (perf_pmus__num_core_pmus() > 1) {
> >>>>>>>> + struct perf_pmu *pmu = NULL;
> >>>>>>>> + __u64 type = PERF_TYPE_RAW;
> >>>>>>>> +
> >>>>>>>> + /*
> >>>>>>>> + * The same register set is supported among different hybrid PMUs.
> >>>>>>>> + * Only check the first available one.
> >>>>>>>> + */
> >>>>>>>> + while ((pmu = perf_pmus__scan_core(pmu)) != NULL) {
> >>>>>>>> + type = pmu->type;
> >>>>>>>> + break;
> >>>>>>>> + }
> >>>>>>>> + attr.config |= type << PERF_PMU_TYPE_SHIFT;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + event_attr_init(&attr);
> >>>>>>>> +
> >>>>>>>> + fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> >>>>>>>> + if (fd != -1) {
> >>>>>>>> + close(fd);
> >>>>>>>> + return true;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return false;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static bool __arch_simd_reg_mask(u64 sample_type, int reg, uint64_t *mask, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + bool supported = false;
> >>>>>>>> + u64 bits;
> >>>>>>>> +
> >>>>>>>> + *mask = 0;
> >>>>>>>> + *qwords = 0;
> >>>>>>>> +
> >>>>>>>> + switch (reg) {
> >>>>>>>> + case PERF_REG_X86_XMM:
> >>>>>>>> + bits = BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1;
> >>>>>>>> + supported = support_simd_reg(sample_type, PERF_X86_XMM_QWORDS, bits, false);
> >>>>>>>> + if (supported) {
> >>>>>>>> + *mask = bits;
> >>>>>>>> + *qwords = PERF_X86_XMM_QWORDS;
> >>>>>>>> + }
> >>>>>>>> + break;
> >>>>>>>> + case PERF_REG_X86_YMM:
> >>>>>>>> + bits = BIT_ULL(PERF_X86_SIMD_YMM_REGS) - 1;
> >>>>>>>> + supported = support_simd_reg(sample_type, PERF_X86_YMM_QWORDS, bits, false);
> >>>>>>>> + if (supported) {
> >>>>>>>> + *mask = bits;
> >>>>>>>> + *qwords = PERF_X86_YMM_QWORDS;
> >>>>>>>> + }
> >>>>>>>> + break;
> >>>>>>>> + case PERF_REG_X86_ZMM:
> >>>>>>>> + bits = BIT_ULL(PERF_X86_SIMD_ZMM_REGS) - 1;
> >>>>>>>> + supported = support_simd_reg(sample_type, PERF_X86_ZMM_QWORDS, bits, false);
> >>>>>>>> + if (supported) {
> >>>>>>>> + *mask = bits;
> >>>>>>>> + *qwords = PERF_X86_ZMM_QWORDS;
> >>>>>>>> + break;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + bits = BIT_ULL(PERF_X86_SIMD_ZMMH_REGS) - 1;
> >>>>>>>> + supported = support_simd_reg(sample_type, PERF_X86_ZMM_QWORDS, bits, false);
> >>>>>>>> + if (supported) {
> >>>>>>>> + *mask = bits;
> >>>>>>>> + *qwords = PERF_X86_ZMMH_QWORDS;
> >>>>>>>> + }
> >>>>>>>> + break;
> >>>>>>>> + default:
> >>>>>>>> + break;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return supported;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static bool __arch_pred_reg_mask(u64 sample_type, int reg, uint64_t *mask, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + bool supported = false;
> >>>>>>>> + u64 bits;
> >>>>>>>> +
> >>>>>>>> + *mask = 0;
> >>>>>>>> + *qwords = 0;
> >>>>>>>> +
> >>>>>>>> + switch (reg) {
> >>>>>>>> + case PERF_REG_X86_OPMASK:
> >>>>>>>> + bits = BIT_ULL(PERF_X86_SIMD_OPMASK_REGS) - 1;
> >>>>>>>> + supported = support_simd_reg(sample_type, PERF_X86_OPMASK_QWORDS, bits, true);
> >>>>>>>> + if (supported) {
> >>>>>>>> + *mask = bits;
> >>>>>>>> + *qwords = PERF_X86_OPMASK_QWORDS;
> >>>>>>>> + }
> >>>>>>>> + break;
> >>>>>>>> + default:
> >>>>>>>> + break;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return supported;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static bool has_cap_simd_regs(void)
> >>>>>>>> +{
> >>>>>>>> + uint64_t mask = BIT_ULL(PERF_X86_SIMD_XMM_REGS) - 1;
> >>>>>>>> + u16 qwords = PERF_X86_XMM_QWORDS;
> >>>>>>>> + static bool has_cap_simd_regs;
> >>>>>>>> + static bool cached;
> >>>>>>>> +
> >>>>>>>> + if (cached)
> >>>>>>>> + return has_cap_simd_regs;
> >>>>>>>> +
> >>>>>>>> + has_cap_simd_regs = __arch_simd_reg_mask(PERF_SAMPLE_REGS_INTR,
> >>>>>>>> + PERF_REG_X86_XMM, &mask, &qwords);
> >>>>>>>> + has_cap_simd_regs |= __arch_simd_reg_mask(PERF_SAMPLE_REGS_USER,
> >>>>>>>> + PERF_REG_X86_XMM, &mask, &qwords);
> >>>>>>>> + cached = true;
> >>>>>>>> +
> >>>>>>>> + return has_cap_simd_regs;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +bool arch_has_simd_regs(u64 mask)
> >>>>>>>> +{
> >>>>>>>> + return has_cap_simd_regs() &&
> >>>>>>>> + mask & GENMASK_ULL(PERF_REG_X86_SSP, PERF_REG_X86_R16);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static const struct sample_reg sample_simd_reg_masks[] = {
> >>>>>>>> + SMPL_REG(XMM, PERF_REG_X86_XMM),
> >>>>>>>> + SMPL_REG(YMM, PERF_REG_X86_YMM),
> >>>>>>>> + SMPL_REG(ZMM, PERF_REG_X86_ZMM),
> >>>>>>>> + SMPL_REG_END
> >>>>>>>> +};
> >>>>>>>> +
> >>>>>>>> +static const struct sample_reg sample_pred_reg_masks[] = {
> >>>>>>>> + SMPL_REG(OPMASK, PERF_REG_X86_OPMASK),
> >>>>>>>> + SMPL_REG_END
> >>>>>>>> +};
> >>>>>>>> +
> >>>>>>>> +const struct sample_reg *arch__sample_simd_reg_masks(void)
> >>>>>>>> +{
> >>>>>>>> + return sample_simd_reg_masks;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +const struct sample_reg *arch__sample_pred_reg_masks(void)
> >>>>>>>> +{
> >>>>>>>> + return sample_pred_reg_masks;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static bool x86_intr_simd_updated;
> >>>>>>>> +static u64 x86_intr_simd_reg_mask;
> >>>>>>>> +static u64 x86_intr_simd_mask[PERF_REG_X86_MAX_SIMD_REGS];
> >>>>>>>> +static u16 x86_intr_simd_qwords[PERF_REG_X86_MAX_SIMD_REGS];
> >>>>>>>> +static bool x86_user_simd_updated;
> >>>>>>>> +static u64 x86_user_simd_reg_mask;
> >>>>>>>> +static u64 x86_user_simd_mask[PERF_REG_X86_MAX_SIMD_REGS];
> >>>>>>>> +static u16 x86_user_simd_qwords[PERF_REG_X86_MAX_SIMD_REGS];
> >>>>>>>> +
> >>>>>>>> +static bool x86_intr_pred_updated;
> >>>>>>>> +static u64 x86_intr_pred_reg_mask;
> >>>>>>>> +static u64 x86_intr_pred_mask[PERF_REG_X86_MAX_PRED_REGS];
> >>>>>>>> +static u16 x86_intr_pred_qwords[PERF_REG_X86_MAX_PRED_REGS];
> >>>>>>>> +static bool x86_user_pred_updated;
> >>>>>>>> +static u64 x86_user_pred_reg_mask;
> >>>>>>>> +static u64 x86_user_pred_mask[PERF_REG_X86_MAX_PRED_REGS];
> >>>>>>>> +static u16 x86_user_pred_qwords[PERF_REG_X86_MAX_PRED_REGS];
> >>>>>>>> +
> >>>>>>>> +static uint64_t __arch__simd_reg_mask(u64 sample_type)
> >>>>>>>> +{
> >>>>>>>> + const struct sample_reg *r = NULL;
> >>>>>>>> + bool supported;
> >>>>>>>> + u64 mask = 0;
> >>>>>>>> + int reg;
> >>>>>>>> +
> >>>>>>>> + if (!has_cap_simd_regs())
> >>>>>>>> + return 0;
> >>>>>>>> +
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR && x86_intr_simd_updated)
> >>>>>>>> + return x86_intr_simd_reg_mask;
> >>>>>>>> +
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_USER && x86_user_simd_updated)
> >>>>>>>> + return x86_user_simd_reg_mask;
> >>>>>>>> +
> >>>>>>>> + for (r = arch__sample_simd_reg_masks(); r->name; r++) {
> >>>>>>>> + supported = false;
> >>>>>>>> +
> >>>>>>>> + if (!r->mask)
> >>>>>>>> + continue;
> >>>>>>>> + reg = fls64(r->mask) - 1;
> >>>>>>>> +
> >>>>>>>> + if (reg >= PERF_REG_X86_MAX_SIMD_REGS)
> >>>>>>>> + break;
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR)
> >>>>>>>> + supported = __arch_simd_reg_mask(sample_type, reg,
> >>>>>>>> + &x86_intr_simd_mask[reg],
> >>>>>>>> + &x86_intr_simd_qwords[reg]);
> >>>>>>>> + else if (sample_type == PERF_SAMPLE_REGS_USER)
> >>>>>>>> + supported = __arch_simd_reg_mask(sample_type, reg,
> >>>>>>>> + &x86_user_simd_mask[reg],
> >>>>>>>> + &x86_user_simd_qwords[reg]);
> >>>>>>>> + if (supported)
> >>>>>>>> + mask |= BIT_ULL(reg);
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR) {
> >>>>>>>> + x86_intr_simd_reg_mask = mask;
> >>>>>>>> + x86_intr_simd_updated = true;
> >>>>>>>> + } else {
> >>>>>>>> + x86_user_simd_reg_mask = mask;
> >>>>>>>> + x86_user_simd_updated = true;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return mask;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static uint64_t __arch__pred_reg_mask(u64 sample_type)
> >>>>>>>> +{
> >>>>>>>> + const struct sample_reg *r = NULL;
> >>>>>>>> + bool supported;
> >>>>>>>> + u64 mask = 0;
> >>>>>>>> + int reg;
> >>>>>>>> +
> >>>>>>>> + if (!has_cap_simd_regs())
> >>>>>>>> + return 0;
> >>>>>>>> +
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR && x86_intr_pred_updated)
> >>>>>>>> + return x86_intr_pred_reg_mask;
> >>>>>>>> +
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_USER && x86_user_pred_updated)
> >>>>>>>> + return x86_user_pred_reg_mask;
> >>>>>>>> +
> >>>>>>>> + for (r = arch__sample_pred_reg_masks(); r->name; r++) {
> >>>>>>>> + supported = false;
> >>>>>>>> +
> >>>>>>>> + if (!r->mask)
> >>>>>>>> + continue;
> >>>>>>>> + reg = fls64(r->mask) - 1;
> >>>>>>>> +
> >>>>>>>> + if (reg >= PERF_REG_X86_MAX_PRED_REGS)
> >>>>>>>> + break;
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR)
> >>>>>>>> + supported = __arch_pred_reg_mask(sample_type, reg,
> >>>>>>>> + &x86_intr_pred_mask[reg],
> >>>>>>>> + &x86_intr_pred_qwords[reg]);
> >>>>>>>> + else if (sample_type == PERF_SAMPLE_REGS_USER)
> >>>>>>>> + supported = __arch_pred_reg_mask(sample_type, reg,
> >>>>>>>> + &x86_user_pred_mask[reg],
> >>>>>>>> + &x86_user_pred_qwords[reg]);
> >>>>>>>> + if (supported)
> >>>>>>>> + mask |= BIT_ULL(reg);
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR) {
> >>>>>>>> + x86_intr_pred_reg_mask = mask;
> >>>>>>>> + x86_intr_pred_updated = true;
> >>>>>>>> + } else {
> >>>>>>>> + x86_user_pred_reg_mask = mask;
> >>>>>>>> + x86_user_pred_updated = true;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return mask;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__intr_simd_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + return __arch__simd_reg_mask(PERF_SAMPLE_REGS_INTR);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__user_simd_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + return __arch__simd_reg_mask(PERF_SAMPLE_REGS_USER);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__intr_pred_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + return __arch__pred_reg_mask(PERF_SAMPLE_REGS_INTR);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__user_pred_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + return __arch__pred_reg_mask(PERF_SAMPLE_REGS_USER);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static uint64_t arch__simd_reg_bitmap_qwords(int reg, u16 *qwords, bool intr)
> >>>>>>>> +{
> >>>>>>>> + uint64_t mask = 0;
> >>>>>>>> +
> >>>>>>>> + *qwords = 0;
> >>>>>>>> + if (reg < PERF_REG_X86_MAX_SIMD_REGS) {
> >>>>>>>> + if (intr) {
> >>>>>>>> + *qwords = x86_intr_simd_qwords[reg];
> >>>>>>>> + mask = x86_intr_simd_mask[reg];
> >>>>>>>> + } else {
> >>>>>>>> + *qwords = x86_user_simd_qwords[reg];
> >>>>>>>> + mask = x86_user_simd_mask[reg];
> >>>>>>>> + }
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return mask;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static uint64_t arch__pred_reg_bitmap_qwords(int reg, u16 *qwords, bool intr)
> >>>>>>>> +{
> >>>>>>>> + uint64_t mask = 0;
> >>>>>>>> +
> >>>>>>>> + *qwords = 0;
> >>>>>>>> + if (reg < PERF_REG_X86_MAX_PRED_REGS) {
> >>>>>>>> + if (intr) {
> >>>>>>>> + *qwords = x86_intr_pred_qwords[reg];
> >>>>>>>> + mask = x86_intr_pred_mask[reg];
> >>>>>>>> + } else {
> >>>>>>>> + *qwords = x86_user_pred_qwords[reg];
> >>>>>>>> + mask = x86_user_pred_mask[reg];
> >>>>>>>> + }
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return mask;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__intr_simd_reg_bitmap_qwords(int reg, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + if (!x86_intr_simd_updated)
> >>>>>>>> + arch__intr_simd_reg_mask();
> >>>>>>>> + return arch__simd_reg_bitmap_qwords(reg, qwords, true);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__user_simd_reg_bitmap_qwords(int reg, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + if (!x86_user_simd_updated)
> >>>>>>>> + arch__user_simd_reg_mask();
> >>>>>>>> + return arch__simd_reg_bitmap_qwords(reg, qwords, false);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__intr_pred_reg_bitmap_qwords(int reg, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + if (!x86_intr_pred_updated)
> >>>>>>>> + arch__intr_pred_reg_mask();
> >>>>>>>> + return arch__pred_reg_bitmap_qwords(reg, qwords, true);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__user_pred_reg_bitmap_qwords(int reg, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + if (!x86_user_pred_updated)
> >>>>>>>> + arch__user_pred_reg_mask();
> >>>>>>>> + return arch__pred_reg_bitmap_qwords(reg, qwords, false);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> const struct sample_reg *arch__sample_reg_masks(void)
> >>>>>>>> {
> >>>>>>>> + if (has_cap_simd_regs())
> >>>>>>>> + return sample_reg_masks_ext;
> >>>>>>>> return sample_reg_masks;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> -uint64_t arch__intr_reg_mask(void)
> >>>>>>>> +static uint64_t __arch__reg_mask(u64 sample_type, u64 mask, bool has_simd_regs)
> >>>>>>>> {
> >>>>>>>> struct perf_event_attr attr = {
> >>>>>>>> - .type = PERF_TYPE_HARDWARE,
> >>>>>>>> - .config = PERF_COUNT_HW_CPU_CYCLES,
> >>>>>>>> - .sample_type = PERF_SAMPLE_REGS_INTR,
> >>>>>>>> - .sample_regs_intr = PERF_REG_EXTENDED_MASK,
> >>>>>>>> - .precise_ip = 1,
> >>>>>>>> - .disabled = 1,
> >>>>>>>> - .exclude_kernel = 1,
> >>>>>>>> + .type = PERF_TYPE_HARDWARE,
> >>>>>>>> + .config = PERF_COUNT_HW_CPU_CYCLES,
> >>>>>>>> + .sample_type = sample_type,
> >>>>>>>> + .precise_ip = 1,
> >>>>>>>> + .disabled = 1,
> >>>>>>>> + .exclude_kernel = 1,
> >>>>>>>> + .sample_simd_regs_enabled = has_simd_regs,
> >>>>>>>> };
> >>>>>>>> int fd;
> >>>>>>>> /*
> >>>>>>>> * In an unnamed union, init it here to build on older gcc versions
> >>>>>>>> */
> >>>>>>>> attr.sample_period = 1;
> >>>>>>>> + if (sample_type == PERF_SAMPLE_REGS_INTR)
> >>>>>>>> + attr.sample_regs_intr = mask;
> >>>>>>>> + else
> >>>>>>>> + attr.sample_regs_user = mask;
> >>>>>>>>
> >>>>>>>> if (perf_pmus__num_core_pmus() > 1) {
> >>>>>>>> struct perf_pmu *pmu = NULL;
> >>>>>>>> @@ -318,13 +738,41 @@ uint64_t arch__intr_reg_mask(void)
> >>>>>>>> fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
> >>>>>>>> if (fd != -1) {
> >>>>>>>> close(fd);
> >>>>>>>> - return (PERF_REG_EXTENDED_MASK | PERF_REGS_MASK);
> >>>>>>>> + return mask;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> - return PERF_REGS_MASK;
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t arch__intr_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + uint64_t mask = PERF_REGS_MASK;
> >>>>>>>> +
> >>>>>>>> + if (has_cap_simd_regs()) {
> >>>>>>>> + mask |= __arch__reg_mask(PERF_SAMPLE_REGS_INTR,
> >>>>>>>> + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16),
> >>>>>>>> + true);
> >>>>>>>> + mask |= __arch__reg_mask(PERF_SAMPLE_REGS_INTR,
> >>>>>>>> + BIT_ULL(PERF_REG_X86_SSP),
> >>>>>>>> + true);
> >>>>>>>> + } else
> >>>>>>>> + mask |= __arch__reg_mask(PERF_SAMPLE_REGS_INTR, PERF_REG_EXTENDED_MASK, false);
> >>>>>>>> +
> >>>>>>>> + return mask;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> uint64_t arch__user_reg_mask(void)
> >>>>>>>> {
> >>>>>>>> - return PERF_REGS_MASK;
> >>>>>>>> + uint64_t mask = PERF_REGS_MASK;
> >>>>>>>> +
> >>>>>>>> + if (has_cap_simd_regs()) {
> >>>>>>>> + mask |= __arch__reg_mask(PERF_SAMPLE_REGS_USER,
> >>>>>>>> + GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16),
> >>>>>>>> + true);
> >>>>>>>> + mask |= __arch__reg_mask(PERF_SAMPLE_REGS_USER,
> >>>>>>>> + BIT_ULL(PERF_REG_X86_SSP),
> >>>>>>>> + true);
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return mask;
> >>>>>>>> }
> >>>>>>>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> >>>>>>>> index 56ebefd075f2..5d1d90cf9488 100644
> >>>>>>>> --- a/tools/perf/util/evsel.c
> >>>>>>>> +++ b/tools/perf/util/evsel.c
> >>>>>>>> @@ -1461,12 +1461,39 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
> >>>>>>>> if (opts->sample_intr_regs && !evsel->no_aux_samples &&
> >>>>>>>> !evsel__is_dummy_event(evsel)) {
> >>>>>>>> attr->sample_regs_intr = opts->sample_intr_regs;
> >>>>>>>> + attr->sample_simd_regs_enabled = arch_has_simd_regs(attr->sample_regs_intr);
> >>>>>>>> + evsel__set_sample_bit(evsel, REGS_INTR);
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + if ((opts->sample_intr_vec_regs || opts->sample_intr_pred_regs) &&
> >>>>>>>> + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
> >>>>>>>> + /* The pred qwords is to implies the set of SIMD registers is used */
> >>>>>>>> + if (opts->sample_pred_regs_qwords)
> >>>>>>>> + attr->sample_simd_pred_reg_qwords = opts->sample_pred_regs_qwords;
> >>>>>>>> + else
> >>>>>>>> + attr->sample_simd_pred_reg_qwords = 1;
> >>>>>>>> + attr->sample_simd_vec_reg_intr = opts->sample_intr_vec_regs;
> >>>>>>>> + attr->sample_simd_vec_reg_qwords = opts->sample_vec_regs_qwords;
> >>>>>>>> + attr->sample_simd_pred_reg_intr = opts->sample_intr_pred_regs;
> >>>>>>>> evsel__set_sample_bit(evsel, REGS_INTR);
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> if (opts->sample_user_regs && !evsel->no_aux_samples &&
> >>>>>>>> !evsel__is_dummy_event(evsel)) {
> >>>>>>>> attr->sample_regs_user |= opts->sample_user_regs;
> >>>>>>>> + attr->sample_simd_regs_enabled = arch_has_simd_regs(attr->sample_regs_user);
> >>>>>>>> + evsel__set_sample_bit(evsel, REGS_USER);
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + if ((opts->sample_user_vec_regs || opts->sample_user_pred_regs) &&
> >>>>>>>> + !evsel->no_aux_samples && !evsel__is_dummy_event(evsel)) {
> >>>>>>>> + if (opts->sample_pred_regs_qwords)
> >>>>>>>> + attr->sample_simd_pred_reg_qwords = opts->sample_pred_regs_qwords;
> >>>>>>>> + else
> >>>>>>>> + attr->sample_simd_pred_reg_qwords = 1;
> >>>>>>>> + attr->sample_simd_vec_reg_user = opts->sample_user_vec_regs;
> >>>>>>>> + attr->sample_simd_vec_reg_qwords = opts->sample_vec_regs_qwords;
> >>>>>>>> + attr->sample_simd_pred_reg_user = opts->sample_user_pred_regs;
> >>>>>>>> evsel__set_sample_bit(evsel, REGS_USER);
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> diff --git a/tools/perf/util/parse-regs-options.c b/tools/perf/util/parse-regs-options.c
> >>>>>>>> index cda1c620968e..0bd100392889 100644
> >>>>>>>> --- a/tools/perf/util/parse-regs-options.c
> >>>>>>>> +++ b/tools/perf/util/parse-regs-options.c
> >>>>>>>> @@ -4,19 +4,139 @@
> >>>>>>>> #include <stdint.h>
> >>>>>>>> #include <string.h>
> >>>>>>>> #include <stdio.h>
> >>>>>>>> +#include <linux/bitops.h>
> >>>>>>>> #include "util/debug.h"
> >>>>>>>> #include <subcmd/parse-options.h>
> >>>>>>>> #include "util/perf_regs.h"
> >>>>>>>> #include "util/parse-regs-options.h"
> >>>>>>>> +#include "record.h"
> >>>>>>>> +
> >>>>>>>> +static void __print_simd_regs(bool intr, uint64_t simd_mask)
> >>>>>>>> +{
> >>>>>>>> + const struct sample_reg *r = NULL;
> >>>>>>>> + uint64_t bitmap = 0;
> >>>>>>>> + u16 qwords = 0;
> >>>>>>>> + int reg_idx;
> >>>>>>>> +
> >>>>>>>> + if (!simd_mask)
> >>>>>>>> + return;
> >>>>>>>> +
> >>>>>>>> + for (r = arch__sample_simd_reg_masks(); r->name; r++) {
> >>>>>>>> + if (!(r->mask & simd_mask))
> >>>>>>>> + continue;
> >>>>>>>> + reg_idx = fls64(r->mask) - 1;
> >>>>>>>> + if (intr)
> >>>>>>>> + bitmap = arch__intr_simd_reg_bitmap_qwords(reg_idx, &qwords);
> >>>>>>>> + else
> >>>>>>>> + bitmap = arch__user_simd_reg_bitmap_qwords(reg_idx, &qwords);
> >>>>>>>> + if (bitmap)
> >>>>>>>> + fprintf(stderr, "%s0-%d ", r->name, fls64(bitmap) - 1);
> >>>>>>>> + }
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static void __print_pred_regs(bool intr, uint64_t pred_mask)
> >>>>>>>> +{
> >>>>>>>> + const struct sample_reg *r = NULL;
> >>>>>>>> + uint64_t bitmap = 0;
> >>>>>>>> + u16 qwords = 0;
> >>>>>>>> + int reg_idx;
> >>>>>>>> +
> >>>>>>>> + if (!pred_mask)
> >>>>>>>> + return;
> >>>>>>>> +
> >>>>>>>> + for (r = arch__sample_pred_reg_masks(); r->name; r++) {
> >>>>>>>> + if (!(r->mask & pred_mask))
> >>>>>>>> + continue;
> >>>>>>>> + reg_idx = fls64(r->mask) - 1;
> >>>>>>>> + if (intr)
> >>>>>>>> + bitmap = arch__intr_pred_reg_bitmap_qwords(reg_idx, &qwords);
> >>>>>>>> + else
> >>>>>>>> + bitmap = arch__user_pred_reg_bitmap_qwords(reg_idx, &qwords);
> >>>>>>>> + if (bitmap)
> >>>>>>>> + fprintf(stderr, "%s0-%d ", r->name, fls64(bitmap) - 1);
> >>>>>>>> + }
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static bool __parse_simd_regs(struct record_opts *opts, char *s, bool intr)
> >>>>>>>> +{
> >>>>>>>> + const struct sample_reg *r = NULL;
> >>>>>>>> + bool matched = false;
> >>>>>>>> + uint64_t bitmap = 0;
> >>>>>>>> + u16 qwords = 0;
> >>>>>>>> + int reg_idx;
> >>>>>>>> +
> >>>>>>>> + for (r = arch__sample_simd_reg_masks(); r->name; r++) {
> >>>>>>>> + if (strcasecmp(s, r->name))
> >>>>>>>> + continue;
> >>>>>>>> + if (!fls64(r->mask))
> >>>>>>>> + continue;
> >>>>>>>> + reg_idx = fls64(r->mask) - 1;
> >>>>>>>> + if (intr)
> >>>>>>>> + bitmap = arch__intr_simd_reg_bitmap_qwords(reg_idx, &qwords);
> >>>>>>>> + else
> >>>>>>>> + bitmap = arch__user_simd_reg_bitmap_qwords(reg_idx, &qwords);
> >>>>>>>> + matched = true;
> >>>>>>>> + break;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + /* Just need the highest qwords */
> >>>>>>>> + if (qwords > opts->sample_vec_regs_qwords) {
> >>>>>>>> + opts->sample_vec_regs_qwords = qwords;
> >>>>>>>> + if (intr)
> >>>>>>>> + opts->sample_intr_vec_regs = bitmap;
> >>>>>>>> + else
> >>>>>>>> + opts->sample_user_vec_regs = bitmap;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return matched;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static bool __parse_pred_regs(struct record_opts *opts, char *s, bool intr)
> >>>>>>>> +{
> >>>>>>>> + const struct sample_reg *r = NULL;
> >>>>>>>> + bool matched = false;
> >>>>>>>> + uint64_t bitmap = 0;
> >>>>>>>> + u16 qwords = 0;
> >>>>>>>> + int reg_idx;
> >>>>>>>> +
> >>>>>>>> + for (r = arch__sample_pred_reg_masks(); r->name; r++) {
> >>>>>>>> + if (strcasecmp(s, r->name))
> >>>>>>>> + continue;
> >>>>>>>> + if (!fls64(r->mask))
> >>>>>>>> + continue;
> >>>>>>>> + reg_idx = fls64(r->mask) - 1;
> >>>>>>>> + if (intr)
> >>>>>>>> + bitmap = arch__intr_pred_reg_bitmap_qwords(reg_idx, &qwords);
> >>>>>>>> + else
> >>>>>>>> + bitmap = arch__user_pred_reg_bitmap_qwords(reg_idx, &qwords);
> >>>>>>>> + matched = true;
> >>>>>>>> + break;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + /* Just need the highest qwords */
> >>>>>>>> + if (qwords > opts->sample_pred_regs_qwords) {
> >>>>>>>> + opts->sample_pred_regs_qwords = qwords;
> >>>>>>>> + if (intr)
> >>>>>>>> + opts->sample_intr_pred_regs = bitmap;
> >>>>>>>> + else
> >>>>>>>> + opts->sample_user_pred_regs = bitmap;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> + return matched;
> >>>>>>>> +}
> >>>>>>>>
> >>>>>>>> static int
> >>>>>>>> __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
> >>>>>>>> {
> >>>>>>>> uint64_t *mode = (uint64_t *)opt->value;
> >>>>>>>> const struct sample_reg *r = NULL;
> >>>>>>>> + struct record_opts *opts;
> >>>>>>>> char *s, *os = NULL, *p;
> >>>>>>>> - int ret = -1;
> >>>>>>>> + bool has_simd_regs = false;
> >>>>>>>> uint64_t mask;
> >>>>>>>> + uint64_t simd_mask;
> >>>>>>>> + uint64_t pred_mask;
> >>>>>>>> + int ret = -1;
> >>>>>>>>
> >>>>>>>> if (unset)
> >>>>>>>> return 0;
> >>>>>>>> @@ -27,10 +147,17 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
> >>>>>>>> if (*mode)
> >>>>>>>> return -1;
> >>>>>>>>
> >>>>>>>> - if (intr)
> >>>>>>>> + if (intr) {
> >>>>>>>> + opts = container_of(opt->value, struct record_opts, sample_intr_regs);
> >>>>>>>> mask = arch__intr_reg_mask();
> >>>>>>>> - else
> >>>>>>>> + simd_mask = arch__intr_simd_reg_mask();
> >>>>>>>> + pred_mask = arch__intr_pred_reg_mask();
> >>>>>>>> + } else {
> >>>>>>>> + opts = container_of(opt->value, struct record_opts, sample_user_regs);
> >>>>>>>> mask = arch__user_reg_mask();
> >>>>>>>> + simd_mask = arch__user_simd_reg_mask();
> >>>>>>>> + pred_mask = arch__user_pred_reg_mask();
> >>>>>>>> + }
> >>>>>>>>
> >>>>>>>> /* str may be NULL in case no arg is passed to -I */
> >>>>>>>> if (str) {
> >>>>>>>> @@ -50,10 +177,24 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
> >>>>>>>> if (r->mask & mask)
> >>>>>>>> fprintf(stderr, "%s ", r->name);
> >>>>>>>> }
> >>>>>>>> + __print_simd_regs(intr, simd_mask);
> >>>>>>>> + __print_pred_regs(intr, pred_mask);
> >>>>>>>> fputc('\n', stderr);
> >>>>>>>> /* just printing available regs */
> >>>>>>>> goto error;
> >>>>>>>> }
> >>>>>>>> +
> >>>>>>>> + if (simd_mask) {
> >>>>>>>> + has_simd_regs = __parse_simd_regs(opts, s, intr);
> >>>>>>>> + if (has_simd_regs)
> >>>>>>>> + goto next;
> >>>>>>>> + }
> >>>>>>>> + if (pred_mask) {
> >>>>>>>> + has_simd_regs = __parse_pred_regs(opts, s, intr);
> >>>>>>>> + if (has_simd_regs)
> >>>>>>>> + goto next;
> >>>>>>>> + }
> >>>>>>>> +
> >>>>>>>> for (r = arch__sample_reg_masks(); r->name; r++) {
> >>>>>>>> if ((r->mask & mask) && !strcasecmp(s, r->name))
> >>>>>>>> break;
> >>>>>>>> @@ -65,7 +206,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> *mode |= r->mask;
> >>>>>>>> -
> >>>>>>>> +next:
> >>>>>>>> if (!p)
> >>>>>>>> break;
> >>>>>>>>
> >>>>>>>> @@ -75,7 +216,7 @@ __parse_regs(const struct option *opt, const char *str, int unset, bool intr)
> >>>>>>>> ret = 0;
> >>>>>>>>
> >>>>>>>> /* default to all possible regs */
> >>>>>>>> - if (*mode == 0)
> >>>>>>>> + if (*mode == 0 && !has_simd_regs)
> >>>>>>>> *mode = mask;
> >>>>>>>> error:
> >>>>>>>> free(os);
> >>>>>>>> diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
> >>>>>>>> index 66b666d9ce64..fb0366d050cf 100644
> >>>>>>>> --- a/tools/perf/util/perf_event_attr_fprintf.c
> >>>>>>>> +++ b/tools/perf/util/perf_event_attr_fprintf.c
> >>>>>>>> @@ -360,6 +360,12 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
> >>>>>>>> PRINT_ATTRf(aux_start_paused, p_unsigned);
> >>>>>>>> PRINT_ATTRf(aux_pause, p_unsigned);
> >>>>>>>> PRINT_ATTRf(aux_resume, p_unsigned);
> >>>>>>>> + PRINT_ATTRf(sample_simd_pred_reg_qwords, p_unsigned);
> >>>>>>>> + PRINT_ATTRf(sample_simd_pred_reg_intr, p_hex);
> >>>>>>>> + PRINT_ATTRf(sample_simd_pred_reg_user, p_hex);
> >>>>>>>> + PRINT_ATTRf(sample_simd_vec_reg_qwords, p_unsigned);
> >>>>>>>> + PRINT_ATTRf(sample_simd_vec_reg_intr, p_hex);
> >>>>>>>> + PRINT_ATTRf(sample_simd_vec_reg_user, p_hex);
> >>>>>>>>
> >>>>>>>> return ret;
> >>>>>>>> }
> >>>>>>>> diff --git a/tools/perf/util/perf_regs.c b/tools/perf/util/perf_regs.c
> >>>>>>>> index 44b90bbf2d07..e8a9fabc92e6 100644
> >>>>>>>> --- a/tools/perf/util/perf_regs.c
> >>>>>>>> +++ b/tools/perf/util/perf_regs.c
> >>>>>>>> @@ -11,6 +11,11 @@ int __weak arch_sdt_arg_parse_op(char *old_op __maybe_unused,
> >>>>>>>> return SDT_ARG_SKIP;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> +bool __weak arch_has_simd_regs(u64 mask __maybe_unused)
> >>>>>>>> +{
> >>>>>>>> + return false;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> uint64_t __weak arch__intr_reg_mask(void)
> >>>>>>>> {
> >>>>>>>> return 0;
> >>>>>>>> @@ -21,6 +26,50 @@ uint64_t __weak arch__user_reg_mask(void)
> >>>>>>>> return 0;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> +uint64_t __weak arch__intr_simd_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t __weak arch__user_simd_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t __weak arch__intr_pred_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t __weak arch__user_pred_reg_mask(void)
> >>>>>>>> +{
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t __weak arch__intr_simd_reg_bitmap_qwords(int reg __maybe_unused, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + *qwords = 0;
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t __weak arch__user_simd_reg_bitmap_qwords(int reg __maybe_unused, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + *qwords = 0;
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t __weak arch__intr_pred_reg_bitmap_qwords(int reg __maybe_unused, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + *qwords = 0;
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +uint64_t __weak arch__user_pred_reg_bitmap_qwords(int reg __maybe_unused, u16 *qwords)
> >>>>>>>> +{
> >>>>>>>> + *qwords = 0;
> >>>>>>>> + return 0;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> static const struct sample_reg sample_reg_masks[] = {
> >>>>>>>> SMPL_REG_END
> >>>>>>>> };
> >>>>>>>> @@ -30,6 +79,16 @@ const struct sample_reg * __weak arch__sample_reg_masks(void)
> >>>>>>>> return sample_reg_masks;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> +const struct sample_reg * __weak arch__sample_simd_reg_masks(void)
> >>>>>>>> +{
> >>>>>>>> + return sample_reg_masks;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +const struct sample_reg * __weak arch__sample_pred_reg_masks(void)
> >>>>>>>> +{
> >>>>>>>> + return sample_reg_masks;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> const char *perf_reg_name(int id, const char *arch)
> >>>>>>>> {
> >>>>>>>> const char *reg_name = NULL;
> >>>>>>>> diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
> >>>>>>>> index f2d0736d65cc..bce9c4cfd1bf 100644
> >>>>>>>> --- a/tools/perf/util/perf_regs.h
> >>>>>>>> +++ b/tools/perf/util/perf_regs.h
> >>>>>>>> @@ -24,9 +24,20 @@ enum {
> >>>>>>>> };
> >>>>>>>>
> >>>>>>>> int arch_sdt_arg_parse_op(char *old_op, char **new_op);
> >>>>>>>> +bool arch_has_simd_regs(u64 mask);
> >>>>>>>> uint64_t arch__intr_reg_mask(void);
> >>>>>>>> uint64_t arch__user_reg_mask(void);
> >>>>>>>> const struct sample_reg *arch__sample_reg_masks(void);
> >>>>>>>> +const struct sample_reg *arch__sample_simd_reg_masks(void);
> >>>>>>>> +const struct sample_reg *arch__sample_pred_reg_masks(void);
> >>>>>>> I wonder we can remove these functions. perf_reg_name(int id, uint16_t
> >>>>>>> e_machine) maps a perf register number and e_machine to a string. So
> >>>>>>> the sample_reg array could be replaced with:
> >>>>>>> ```
> >>>>>>> for (int perf_reg = 0; perf_reg < 64; perf_reg++) {
> >>>>>>> uint64_t mask = 1LL << perf_reg;
> >>>>>>> const char *name = perf_reg_name(perf_reg, EM_HOST);
> >>>>>>> if (name == NULL)
> >>>>>>> break;
> >>>>>>> // use mask and name
> >>>>>>> ```
> >>>>>>> To make it work for SIMD and PRED then I guess we need to iterate
> >>>>>>> through the ABIs of enum perf_sample_regs_abi.
> >>>>>> Suppose so.
> >>>>>>
> >>>>>>
> >>>>>>>> +uint64_t arch__intr_simd_reg_mask(void);
> >>>>>>>> +uint64_t arch__user_simd_reg_mask(void);
> >>>>>>>> +uint64_t arch__intr_pred_reg_mask(void);
> >>>>>>>> +uint64_t arch__user_pred_reg_mask(void);
> >>>>>>> I think some comments would be useful here like:
> >>>>>>> ```
> >>>>>>> /* Perf register bit map with valid bits for
> >>>>>>> perf_event_attr.sample_regs_user. */
> >>>>>>> uint64_t arch__intr_reg_mask(void);
> >>>>>>> /* Perf register bit map with valid bits for
> >>>>>>> perf_event_attr.sample_regs_intr. */
> >>>>>>> uint64_t arch__user_reg_mask(void);
> >>>>>>> /* Perf register bit map with valid bits for
> >>>>>>> perf_event_attr.sample_simd_vec_reg_intr. */
> >>>>>>> uint64_t arch__intr_simd_reg_mask(void);
> >>>>>>> /* Perf register bit map with valid bits for
> >>>>>>> perf_event_attr.sample_simd_vec_reg_user. */
> >>>>>>> uint64_t arch__user_simd_reg_mask(void);
> >>>>>>> /* Perf register bit map with valid bits for
> >>>>>>> perf_event_attr.sample_simd_pred_reg_intr. */
> >>>>>>> uint64_t arch__intr_pred_reg_mask(void);
> >>>>>>> /* Perf register bit map with valid bits for
> >>>>>>> perf_event_attr.sample_simd_pred_reg_user. */
> >>>>>>> uint64_t arch__user_pred_reg_mask(void);
> >>>>>> Sure. Thanks.
> >>>>>>
> >>>>>>
> >>>>>>> ```
> >>>>>>>
> >>>>>>> Why do the arch__user_pred_reg_mask return a uint64_t when the
> >>>>>>> perf_event_attr variable is a __u32?
> >>>>>> Suppose it's a bug. :)
> >>>>>>
> >>>>>>
> >>>>>>>> +uint64_t arch__intr_simd_reg_bitmap_qwords(int reg, u16 *qwords);
> >>>>>>>> +uint64_t arch__user_simd_reg_bitmap_qwords(int reg, u16 *qwords);
> >>>>>>>> +uint64_t arch__intr_pred_reg_bitmap_qwords(int reg, u16 *qwords);
> >>>>>>>> +uint64_t arch__user_pred_reg_bitmap_qwords(int reg, u16 *qwords);
> >>>>>>> I don't understand this function. The qwords is specific to a
> >>>>>>> perf_event_attr. We could have an evlist with an evsel set up to
> >>>>>>> sample say XMM registers and another evsel set up to sample ZMM
> >>>>>>> registers. Are the qwords here always for the ZMM case, or is XMM,
> >>>>>>> YMM, ZMM depending on architecture support? Why does it vary per
> >>>>>>> register? The surrounding code uses the term mask but here bitmap is
> >>>>>>> used, is the inconsistency deliberate? Why are there user and intr
> >>>>>>> functions when in the perf_event_attr there are only
> >>>>>>> sample_simd_pred_reg_qwords and sample_simd_ved_reg_qwords variables?
> >>>>>> These 4 functions is designed to get the bitmask and qwords length for a
> >>>>>> specific SIMD registers. E.g., For XMM on x86 platforms, the returned
> >>>>>> bitmask is 0xffff (xmm0 ~ xmm15) and the qwords length is 2 (128 bits). For
> >>>>>> ZMM on x86 platforms, if the platform only supports 16 ZMM registers, then
> >>>>>> the returned bitmask is 0xffff (zmm0 ~ zmm15) and qwords length is 8 (512
> >>>>>> bits). If the platform supports 32 ZMM registers, then the returned bitmask
> >>>>>> is 0xffffffff (zmm0 ~ zmm31) and qwords length is 8 (512 bits).
> >>>>> What is the meaning of reg? In this file it is normally the integer
> >>>>> index for a bit in the sample_regs_user mask, but for x86 I don't see
> >>>>> enum perf_event_x86_regs having differing XMM, YMM and ZMM encodings.
> >>>>> Similarly, is qwords an out argument, but then you also have the
> >>>>> bitmap. It looks like the code is caching values but that assumes a
> >>>>> single qword length for all events.
> >>>> Yes, the "reg" argument indicates the SIMD register index. Strictly
> >>>> speaking for x86 platform, the qwords length is fixed for a specific SIMD
> >>>> register and only the register number could vary, e.g., some platforms
> >>>> could only support 16 ZMM registers, but some other platforms could support
> >>>> 32 ZMM registers. But considering this is a generic function for all kinds
> >>>> of archs, we can't ensure there are fixed length for a specific SIMD
> >>>> register on any arch, so I introduce the "qwords" argument to increase the
> >>>> flexibility.
> >>> I'm still not understanding this still :-) What is a "SIMD register
> >>> index", the file is for perf registers and naturally enum
> >>> perf_event_x86_regs on x86, but that doesn't encode YMM and ZMM
> >>> registers. Perhaps you can give some examples?
> >> Yes, it's something just like the register index in the enum
> >> perf_event_x86_regs, e.g. the index of AX register is PERF_REG_X86_AX, the
> >> index of BX is PERF_REG_X86_BX, and so on.
> >>
> >> But the difference is that each index in the perf_event_x86_regs can only
> >> represent a u64 word. Assume we still want to represent the SIMD registers
> >> with the perf_event_x86_regs enum, then each XMM register needs 2 indexes,
> >> each YMM register needs 4 indexes and each ZMM needs 8 indexes. Considering
> >> there are 16 XMM registers, 16 YMM registers and 32 ZMM registers. To
> >> represent all these indexes, then the enum perf_event_x86_regs would become
> >> quite large, and correspondingly the sample_regs_intr/sample_regs_user
> >> fields in the perf_event_attr would have to inflate much. That would
> >> consume much memory.
> >>
> >> So that's why we introduce the new below attributes.
> >>
> >> + union { + __u16 sample_simd_regs_enabled; + __u16
> >> sample_simd_pred_reg_qwords; + }; + __u32 sample_simd_pred_reg_intr; +
> >> __u32 sample_simd_pred_reg_user; + __u16 sample_simd_vec_reg_qwords; +
> >> __u64 sample_simd_vec_reg_intr; + __u64 sample_simd_vec_reg_user; + __u32
> >> __reserved_4; For SIMD registers, each kind of SIMD register would be
> >> treated as a whole. The sample_simd_vec_reg_qwords would be used to
> >> identify the length of SIMD register, simultaneously it also hint which
> >> kind of SIMD register it is since the length of each kind of SIMD register
> >> is different. E.g. we want to sample XMM registers. We know there are 16
> >> XMM registers on the x86 platform and qwords length of XMM register is 2.
> >> So user space needs to set the attributes like this,
> >>
> >> sample_simd_vec_reg_intr = 0xffff;
> >>
> >> sample_simd_vec_reg_qwords = 2;
> >>
> >> Come back to "reg" argument, we know there could be multiple kinds of SIMD
> >> registers supported on some kind of arch, e.g., x86 support XMM, YMM, ZMM
> >> and OPMASK SIMD registers. As each kind of SIMD register is always sampled
> >> as a whole, we don't need to represent each of SIMD register, like XMM0,
> >> XMM1, but we indeed need to distinguish different kinds of SIMD register,
> >> like XMM and YMM registers, since they have different register length and
> >> number.
> >>
> >> That's why we define the index for each kind of SIMD register, like below,
> >>
> >> +enum { + PERF_REG_X86_XMM, + PERF_REG_X86_YMM, + PERF_REG_X86_ZMM, +
> >> PERF_REG_X86_MAX_SIMD_REGS, + + PERF_REG_X86_OPMASK = 0, +
> >> PERF_REG_X86_MAX_PRED_REGS = 1, +}; It's similar withperf_event_x86_regs, but each index represents a kind of SIMD register instead of a specific SIMD register.
> > Could you give me an example call to say
> > arch__intr_simd_reg_bitmap_qwords where you say what the value of reg
> > is, what the expected value of qwords is and what the result will be?
> > Could you do it for say a model without AVX, a model with AVX, a model
> > with AVX512 and a model with APX.
>
> Assume we are on a x86 platform which only supports XMM registers (AVX) and
> call the function arch__intr_simd_reg_bitmap_qwords() with SIMD register index,
>
> 1. reg = PERF_REG_X86_XMM
>
> The return value (XMM registers bitmask) = 0xffff and the qwords = 2 (128 bits).
Thanks!
Can we rename PERF_REG_X86_XMM to say PERF_REG_CLASS_X86_XMM
(similarly reg to reg_class), currently the name is very close to
PERF_REG_X86_XMM0 but that value is in a different enum.
So the bitmask is in terms of the qwords whilst the regular perf
register mask is 1 64-bit qword per bit.
> 2. reg = PERF_REG_X86_YMM
>
> The return value (YMM registers bitmask) = 0 and the qwords = 0 since YMM registers are not supported.
>
> 3. reg = PERF_REG_X86_ZMM
>
> The return value (ZMM registers bitmask) = 0 and the qwords = 0 since ZMM registers are not supported.
Ok.
> Assume we are on a x86 platform which supports XMM/YMM/ZMM registers (AVX512) and call the function arch__intr_simd_reg_bitmap_qwords() with SIMD register index,
>
> 1. reg = PERF_REG_X86_XMM
>
> The return value (XMM registers bitmask) = 0xffff and the qwords = 2 (128 bits).
>
> 2. reg = PERF_REG_X86_YMM
>
> The return value (YMM registers bitmask) = 0xffff and the qwords = 4 (256 bits).
Ok, qwords got bigger.
> 3. reg = PERF_REG_X86_ZMM
>
> The return value (ZMM registers bitmask) = 0xffffffff and the qwords = 8 (512 bits). We assume this platform supports 32 ZMM registers (ZMM0 ~ ZMM31).
Wouldn't it then also support 32 YMM and XMM registers in the 2 cases above?
> As for APX, it has nothing to do with these 4 functions, whether it's supported is determined by the helpers arch__intr_reg_mask()/arch__user_reg_mask().
> e.g.,
>
> ```
> if (has_cap_simd_regs()) {
> mask |= __arch__reg_mask(PERF_SAMPLE_REGS_INTR,
> GENMASK_ULL(PERF_REG_X86_R31, PERF_REG_X86_R16),
> true);
> mask |= __arch__reg_mask(PERF_SAMPLE_REGS_INTR,
> BIT_ULL(PERF_REG_X86_SSP),
> true);
> ```
> If the platform supports APX eGPRs, then the returned mask from arch__intr_reg_mask()/arch__user_reg_mask() would contain the eGPRs mask, otherwise, it would not.
Thanks for the explanation. In the perf_regs.h why do YMMH and ZMMH
also appear but not as arguments for arch__intr_simd_reg_bitmap_qwords
?
> >
> > I have looked at the code and read the changes to perf_event_attr
> > which is why I was confused by your saying that ZMM could be passed in
> > as a perf register number. I am confused as why when the
> > perf_event_attr has 2 qword length related variables this code seems
> > to be setting things up so that every register can have a qword
> > length. I'm confused what is happening with the return value of this
> > function. As values are being stored into global variables, and you
> > are saying they aren't a max value, then how does this impact the
> > setting up of multiple register sampling events?
>
> The 2 qwords length, sample_simd_pred_reg_qwords is to store the PRED
> register length, and sample_simd_vec_reg_qwords is to store the SIMD
> register length. Since the SIMD/PRED registers with larger length would
> contain the SIMD/PRED register with shorter length, so only the largest
> length would be set to into the variables
> sample_simd_vec_reg_qwords/sample_simd_vec_reg_qwords .
>
> E.g.,
>
> perf record -e cyles:p -Ixmm,ymm,zmm -c 10000 -- sleep 1
>
> The sample_simd_vec_reg_qwords would be set to 8 to represent the largest
> length (ZMM) and kernel directly samples ZMM registers since ZMM registers
> fully contains YMM and XMM registers.
>
> The reason that caching bitmask and qwords is that the bitmask and qwords
> for a specific SIMD/PRED register is fixed on a certain x86 platform,
> right? E.g. the qwords length of XMM register is always 2, YMM is 4, etc...
Ok. My confusion is the overloaded meaning of a perf register in this
file, hence it'd be nice to make the names more distinct.
> The bitmask and qwords values are retrieved from kernel by
> perf_event_open() syscall which is quite expensive, if it's called
> frequently, it would impact the performance heavily.
Agreed. It is a shame the existing probing/caching aren't used.
Thanks,
Ian
> >
> > Thanks,
> > Ian
> >
> >>> How does the generic differing qword per register case get encoded
> >>> into a perf_event_attr? If it can't be then this seems like
> >>> functionality for no benefit. I also don't understand how the data in
> >>> the PERF_SAMPLE_REGS_USER part of a sample could be decoded as that is
> >>> assuming a constant qword number.
> >>>
> >>>> No, the qwords would be assigned to true register length if the register
> >>>> exists on the platform, e.g., xmm = 2, ymm = 4 and zmm = 8. if the
> >>>> register is not support on the platfom, the qwords would be set to 0.
> >>> So it is a max function of the vector/pred qwords supported on the architecture.
> >> Strictly speaking, it's not "max" function of the vector/pred qwords, it's
> >> just a function to get the exact vector/pred qwords supported on the
> >> architecture since qwords length won't vary for a fixed kind of SIMD register.
> >>
> >>
> >>>>>> Since the qword length is always fixed for any certain SIMD register
> >>>>>> regardless of intr or user, so there is only one
> >>>>>> sample_simd_pred_reg_qwords or sample_simd_ved_reg_qwords variable.
> >>>>> Ok. 2 variables, but 4 functions here. I think there should just be 2
> >>>>> because of this.
> >>>> Yes, the user and intr variants would be merged into only one.
> >>> Thanks,
> >>> Ian
> >>>
> >>>>> Thanks,
> >>>>> Ian
> >>>>>
> >>>>>>> Perhaps these functions should be something more like:
> >>>>>>> ```
> >>>>>>> /* Maximum value that can be assigned to
> >>>>>>> perf_event_atttr.sample_simd_pred_reg_qwords. */
> >>>>>>> uint16_t arch__simd_pred_reg_qwords_max(void);
> >>>>>>> /* Maximum value that can be assigned to
> >>>>>>> perf_event_atttr.sample_simd_vec_reg_qwords. */
> >>>>>>> uint16_t arch__simd_vec_reg_qwords_max(void);
> >>>>>>> ```
> >>>>>>> Then the bitmap computation logic can all be moved into parse-regs-options.c.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Ian
> >>>>>>>
> >>>>>>>> const char *perf_reg_name(int id, const char *arch);
> >>>>>>>> int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
> >>>>>>>> diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
> >>>>>>>> index ea3a6c4657ee..825ffb4cc53f 100644
> >>>>>>>> --- a/tools/perf/util/record.h
> >>>>>>>> +++ b/tools/perf/util/record.h
> >>>>>>>> @@ -59,7 +59,13 @@ struct record_opts {
> >>>>>>>> unsigned int user_freq;
> >>>>>>>> u64 branch_stack;
> >>>>>>>> u64 sample_intr_regs;
> >>>>>>>> + u64 sample_intr_vec_regs;
> >>>>>>>> u64 sample_user_regs;
> >>>>>>>> + u64 sample_user_vec_regs;
> >>>>>>>> + u16 sample_pred_regs_qwords;
> >>>>>>>> + u16 sample_vec_regs_qwords;
> >>>>>>>> + u16 sample_intr_pred_regs;
> >>>>>>>> + u16 sample_user_pred_regs;
> >>>>>>>> u64 default_interval;
> >>>>>>>> u64 user_interval;
> >>>>>>>> size_t auxtrace_snapshot_size;
> >>>>>>>> --
> >>>>>>>> 2.34.1
> >>>>>>>>
Powered by blists - more mailing lists