linux-kernel - Re: [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUNeJpj-rdi+552-o9xhCiT41q-Q8Zhg8U7ev0Wbiyhog@mail.gmail.com>
Date: Mon, 19 Jan 2026 12:25:48 -0800
From: Ian Rogers <irogers@...gle.com>
To: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	Adrian Hunter <adrian.hunter@...el.com>, Jiri Olsa <jolsa@...nel.org>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Andi Kleen <ak@...ux.intel.com>, 
	Eranian Stephane <eranian@...gle.com>, Mark Rutland <mark.rutland@....com>, broonie@...nel.org, 
	Ravi Bangoria <ravi.bangoria@....com>, linux-kernel@...r.kernel.org, 
	linux-perf-users@...r.kernel.org, Zide Chen <zide.chen@...el.com>, 
	Falcon Thomas <thomas.falcon@...el.com>, Dapeng Mi <dapeng1.mi@...el.com>, 
	Xudong Hao <xudong.hao@...el.com>, Kan Liang <kan.liang@...ux.intel.com>
Subject: Re: [Patch v5 18/19] perf parse-regs: Support new SIMD sampling format

On Sun, Jan 18, 2026 at 10:55 PM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
>
>
> On 1/17/2026 1:50 PM, Ian Rogers wrote:
> > On Mon, Jan 5, 2026 at 11:27 PM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
> >> Ian,
> >>
> >> I looked at these perf regs __weak helpers again, like
> >> arch__intr_reg_mask()/arch__user_reg_mask(). It could be really hard to
> >> eliminate these __weak helpers and convert them into a generic function
> >> like perf_reg_name(). All these __weak helpers are arch-dependent and
> >> usually need to call perf_event_open sysctrl to get the required registers
> >> mask. So even we convert them into a generic function, we still have no way
> >> to get the registers mask of a different arch, like get x86 registers mask
> >> on arm machine. Another reason is that these __weak helpers may contain
> >> some arch-specific instructions. If we want to convert them into a general
> >> perf function like perf_reg_name(). It may cause building error since these
> >> arch-specific instructions may not exist on the building machine.
> > Hi Dapeng,
> >
> > There was already a patch to better support cross architecture
> > libdw-unwind-ing and I've just sent out a series to clean this up so
> > that this is achieved by having mapping functions between perf and
> > dwarf register names. The functions use the e_machine of the binary to
> > determine how to map, etc. The series is here:
> > https://lore.kernel.org/lkml/20260117052849.2205545-1-irogers@google.com/
> > and I think it can be the foundation for avoiding the weak functions.
>
> Hi Ian,
>
> Thanks for the reference patch. But they are different. The reference
> patches mainly parse the regs from perf.data and the __weak functions can
> be eliminated in the parsing phase since the registers bitmap is fixed for
> a fixed arch. While these __weak functions
> arch__intr_reg_mask()/arch__user_reg_mask() are used to obtain the support
> sampling registers on a specific platform.
>
> We know different platforms even for same arch may support different
> registers, e.g., some x86 platforms may only support XMM registers, but
> some others may support XMM/YMM/ZMM registers, then all these arch-specific
> arch__intr_reg_mask()/arch__user_reg_mask() functions have to depend on the
> perf_event_open() syscall to retrieve the supported registers mask from kernel.
>
> Thus, it becomes impossible to retrieve the supported registers mask for a
> x86 specific platform from running on a arm platform.
>
> Even we don't consider this limitation and forcibly convert the
> __weak arch__intr_reg_mask() function to some kind of below function, just
> like currently what perf_reg_name() does.
>
> uint64_t perf_intr_reg_mask(const char *arch)
> {
>     uint64_t mask = 0;
>
>     if (!strcmp(arch, "csky"))
>         mask = perf_intr_reg_mask_csky(id);
>     else if (!strcmp(arch, "loongarch"))
>         mask = perf_intr_reg_mask_loongarch(id);
>     else if (!strcmp(arch, "mips"))
>         mask = perf_intr_reg_mask_mips(id);
>     else if (!strcmp(arch, "powerpc"))
>         mask = perf_intr_reg_mask_powerpc(id);
>     else if (!strcmp(arch, "riscv"))
>         mask = perf_intr_reg_mask_riscv(id);
>     else if (!strcmp(arch, "s390"))
>         mask = perf_intr_reg_mask_s390(id);
>     else if (!strcmp(arch, "x86"))
>         mask = perf_intr_reg_mask_x86(id);
>     else if (!strcmp(arch, "arm"))
>         mask = perf_intr_reg_mask_arm(id);
>     else if (!strcmp(arch, "arm64"))
>         mask = perf_intr_reg_mask_arm64(id);
>
>     return mask;
> }
>
> But currently there are some arch-dependent instructions in these
> arch-specific instructions, like the below code in powerpc specific
> arch__intr_reg_mask().
>
>     version = (((mfspr(SPRN_PVR)) >>  16) & 0xFFFF);
>
> mfspr is a powerpc specific instruction, building this converted
> perf_intr_reg_mask on non-powerpc platform would lead to building error.

Hi Dapeng,

So my main point is the arch directory and ifdefs, how do they differ
from writing code that uses the ELF machine? For example, your code
uses the arch/x86 directory and has ifdefs on
HAVE_ARCH_X86_64_SUPPORT. How is that different from:
```
switch(e_machine) {
case EM_X86_64:
...
case EM_I386:
...
default:
return 0;
}
```
If we need to determine for the current running machine then e_machine
can equal EM_HOST that is set up for this purpose.

I agree that determining features needs calls that may not be
supported on other architectures. That should yield EOPNOTSUPP and we
can use information like that to populate generic information like the
PMU missing features:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n190
we also probe API support with:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/perf_api_probe.h?h=perf-tools-next

The current code doing lots of string comparisons is unnecessary
overhead and imprecise (x86 is used for both 32-bit and 64-bit x86).
It is removed in the series I linked to, I think we can eventually get
rid of the whole arch string for similar reasons of trying to minimize
the use of the arch directory. I'm curious what happens with APX, will
the e_machine change? We may need to pass in the sample regs_dump's
abi field for cases like this.

My point on the unwinding is that the sample register mask appears to
be set up the same regardless, whereas for stack samples
(--call-graph=dwarf) maybe just sample IP and SP suffices. So perhaps
there should be additional registers to set up the sample mask.

By avoiding the arch functions we can avoid the problem of broken
cross architecture support, we can also lay the groundwork for support
on different architectures that may want to do similar things. I agree
that doesn't matter until >1 architecture is trying to have more
register masks, my concern is trying to keep the code generic and
trying to make sure cross architecture is working. New weak functions
is going in the opposite direction to that.

Thanks,
Ian

> -Dapeng Mi
>
> >
> > I also noticed that I think we're sampling the XMM registers for dwarf
> > unwinding, but it seems unlikely the XMM registers will hold stack
> > frame information - so this is probably an x86 inefficiency.
> >
> > Thanks,
> > Ian
> >