linux-kernel - Re: [PATCH 1/3] perf test: Support arch-specific shell tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWUZ78s1DMzQV8SP0U8Hb=TnCwB1_t2_o+3UcoJeyi=0Q@mail.gmail.com>
Date: Fri, 23 May 2025 10:54:02 -0700
From: Ian Rogers <irogers@...gle.com>
To: James Clark <james.clark@...aro.org>
Cc: Namhyung Kim <namhyung@...nel.org>, Arnaldo Carvalho de Melo <acme@...nel.org>, 
	Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>, 
	linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 1/3] perf test: Support arch-specific shell tests

On Fri, May 23, 2025 at 3:48 AM James Clark <james.clark@...aro.org> wrote:
>
> On 22/05/2025 9:09 pm, Ian Rogers wrote:
> > On Thu, May 22, 2025 at 10:10 AM Namhyung Kim <namhyung@...nel.org> wrote:
> >>
> >> This is a preparation for shell tests belong to an arch.
> >
> > I keep repeating that I don't like arch and I think ideally we'd be
> > getting rid of the C arch tests. I just sent out a patch doing this
> > for 1 test:
> > https://lore.kernel.org/lkml/20250521165317.713463-2-irogers@google.com/
> > We should be able to make perf, tests, etc. dependent on a PMU rather
> > than an architecture. This means that running perf built for ARM will
> > be able to do things running on an instruction emulator on x86. It
>
> In this case for Arm SPE and Coresight you can only generate trace by
> running on a full model or a real CPU, so I'm not sure if we could ever
> get close to running on just an emulator.

Ok, that's the SPE and coresight PMUs, shouldn't the bulk of "perf
test" pass under an emulator and those tests skip when the PMUs aren't
there?

> > means the tool, the kernel APIs, etc. are generic and new
> > architectures like RISC-V can test things. It means cross-platform
> > (record on 1 machine type, report on another) can work without
> > tripping over load bearing architecture ifdefs. It means that we
>
> I have thought about adding some generic decoding side tests for SPE and
> Coresight, but couldn't really get past the fact that you need to put
> the trace dump _and_ the binaries traced into the git repo. Not only
> would this benefit testing on other arches like you say, but it would
> also lock down that decoding of a known file doesn't regress which we
> can't currently do by generating new trace every time the test runs.
>
> If we ever added this they would be separate tests though so they could
> go in the top level folder, where the ones in the arch folder would
> continue to do record and decode. Maybe naming the folders by PMU could
> work, but you could also have both PMU name and arch name folders like:
>
> Recording/requires hardware:
>
>    tools/perf/arch/arm64/tests/shell/cs_etm/
>
> Cross platform decode tests:
>
>    tools/perf/tests/shell/cs_etm/
>
> Which would mirror how the source files are currently laid out:
>
>   tools/perf/arch/arm/util/cs-etm.c
>   tools/perf/util/cs-etm.c

So this a digression, I don't necessarily disagree with it. Back to
adding arch-specific shell tests, we should be working hard to not
target work to a particular architecture. Making code, tests, etc.
specific to an architecture means someone, quite often myself, has to
come along later and undo the damage. Hybrid being a very large and
long case in point where >1 core PMU was hard coded to only work
within the tool for two specific PMUs. To be specific as to why
notions of architecture in the code base are actively harmful (off the
top of my head):

1) architecture isn't a specific enough term as PMUs are always
evolving so we need to support figuring this out, not just hiding the
affected code by an ifdef or build option;

2) lower and more problematic test coverage (will we shellcheck an ARM
test on x86? we're lowering the container coverage or requiring it to
be run on multiple architectures);

3) the tests need to be able to handle being on the architecture with
or without the PMU (such as under a hypervisor);
3.1) the without PMU case is a very frequent source of my patching
tests as most people test on a bare metal development machines and
forget continuous testing environments;
3.2) this kind of support can enable continuous testing efforts that
are run under an emulator. If a cloud has no ARM offering they can at
least emulator test, and SuSE have at least offered this. Google may
or may not run testing through emulators and tests need to handle it,
we have and do carry patches for this;

4) broken cross-platform support, such as with sample parsing where
decoding a sample is missing pieces when not done on x86 (come on,
this is terrible and we don't want more of it);

5) the tests do and should make use of common library code for things
like checking the perf event permissions, this code lives in
tools/perf/tests/shell/lib but the arch dependencies go where? We
shouldn't reinvent the shell testing environment per architecture.


I get that vendors would quite like things not to work on other
architectures, or for event parsing [1], testing, etc. to be in some
way crippled and require architecture specific reinvention. Why should
their engineers invest in infrastructure and testing that could
benefit someone else? Heck sometimes it feels like vendors may be
doing everything in their power via broken PMU names, event names,
etc. to make common infrastructure break when not run in their arch
specific way [2]? I see things differently and indeed when things go
wrong vendors argue that the generic code/behaviors needs to be
changed (I'm thinking of Mark Rutland on Apple PMU support and the
generic behavior changes made at that time that are now not carried
through leading to event parsing and display inconsistencies as well
as RISC-V needing to adopt a different event lookup strategy).

I want the perf tool to have its behavior be generic. The kernel has
an arch directory for understandable reasons, for the perf tool the
key thing is the PMU and we've been building out the infrastructure
for this. Should the perf tool have every event, metric or syscall
number for every architecture? For cross-platform work it doesn't seem
like a completely terrible idea. I'm sensitive to binary size [3] so
I'm not pushing on it.

Anyway, this change is unnecessary as the current skipping behavior
works and is better for the reasons enumerated above. This change gets
a very large nack from me. I'm supportive of more IBS coverage in the
rest of the series, but this change isn't required for that.

Thanks,
Ian

[1] https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
[2] https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/drivers/perf/arm_cspmu/arm_cspmu.c?h=perf-tools-next#n150
- please call it cpu_cycles to avoid ambiguity with the legacy event.
[3] https://lore.kernel.org/lkml/20250417230740.86048-1-irogers@google.com/



> Thanks
> James
>
> > benefit from more testing on generic pieces of code on all
> > architectures - like sample parsing. We can always strcmp the PMU name
> > or the architecture at runtime.
> >
> > Structure wise we could have:
> > tools/perf/pmu/ibs_op/tests/
> > tools/perf/pmu/ibs_op/tests/shell
> >
> > It feels noisy compared to just having the shell test in
> > tools/perf/tests/shell skip when the PMU isn't present. There are also
> > things like library dependencies that aren't clear when we have >1
> > directory. I'd prefer if new testing followed the existing model
> > rather than this.
> >
> > Thanks,
> > Ian
> >