linux-kernel - Re: [PATCH 1/3] perf test: Support arch-specific shell tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aDDpsC2sUQaC7LuZ@google.com>
Date: Fri, 23 May 2025 14:33:36 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: James Clark <james.clark@...aro.org>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 1/3] perf test: Support arch-specific shell tests

Hi Ian,

On Fri, May 23, 2025 at 10:54:02AM -0700, Ian Rogers wrote:
> On Fri, May 23, 2025 at 3:48 AM James Clark <james.clark@...aro.org> wrote:
> >
> > On 22/05/2025 9:09 pm, Ian Rogers wrote:
> > > On Thu, May 22, 2025 at 10:10 AM Namhyung Kim <namhyung@...nel.org> wrote:
> > >>
> > >> This is a preparation for shell tests belong to an arch.
> > >
> > > I keep repeating that I don't like arch and I think ideally we'd be
> > > getting rid of the C arch tests. I just sent out a patch doing this
> > > for 1 test:
> > > https://lore.kernel.org/lkml/20250521165317.713463-2-irogers@google.com/
> > > We should be able to make perf, tests, etc. dependent on a PMU rather
> > > than an architecture. This means that running perf built for ARM will
> > > be able to do things running on an instruction emulator on x86. It
> >
> > In this case for Arm SPE and Coresight you can only generate trace by
> > running on a full model or a real CPU, so I'm not sure if we could ever
> > get close to running on just an emulator.
> 
> Ok, that's the SPE and coresight PMUs, shouldn't the bulk of "perf
> test" pass under an emulator and those tests skip when the PMUs aren't
> there?
> 
> > > means the tool, the kernel APIs, etc. are generic and new
> > > architectures like RISC-V can test things. It means cross-platform
> > > (record on 1 machine type, report on another) can work without
> > > tripping over load bearing architecture ifdefs. It means that we
> >
> > I have thought about adding some generic decoding side tests for SPE and
> > Coresight, but couldn't really get past the fact that you need to put
> > the trace dump _and_ the binaries traced into the git repo. Not only
> > would this benefit testing on other arches like you say, but it would
> > also lock down that decoding of a known file doesn't regress which we
> > can't currently do by generating new trace every time the test runs.
> >
> > If we ever added this they would be separate tests though so they could
> > go in the top level folder, where the ones in the arch folder would
> > continue to do record and decode. Maybe naming the folders by PMU could
> > work, but you could also have both PMU name and arch name folders like:
> >
> > Recording/requires hardware:
> >
> >    tools/perf/arch/arm64/tests/shell/cs_etm/
> >
> > Cross platform decode tests:
> >
> >    tools/perf/tests/shell/cs_etm/
> >
> > Which would mirror how the source files are currently laid out:
> >
> >   tools/perf/arch/arm/util/cs-etm.c
> >   tools/perf/util/cs-etm.c
> 
> So this a digression, I don't necessarily disagree with it. Back to
> adding arch-specific shell tests, we should be working hard to not
> target work to a particular architecture. Making code, tests, etc.
> specific to an architecture means someone, quite often myself, has to
> come along later and undo the damage. Hybrid being a very large and
> long case in point where >1 core PMU was hard coded to only work
> within the tool for two specific PMUs. To be specific as to why
> notions of architecture in the code base are actively harmful (off the
> top of my head):
> 
> 1) architecture isn't a specific enough term as PMUs are always
> evolving so we need to support figuring this out, not just hiding the
> affected code by an ifdef or build option;
> 
> 2) lower and more problematic test coverage (will we shellcheck an ARM
> test on x86? we're lowering the container coverage or requiring it to
> be run on multiple architectures);
> 
> 3) the tests need to be able to handle being on the architecture with
> or without the PMU (such as under a hypervisor);
> 3.1) the without PMU case is a very frequent source of my patching
> tests as most people test on a bare metal development machines and
> forget continuous testing environments;
> 3.2) this kind of support can enable continuous testing efforts that
> are run under an emulator. If a cloud has no ARM offering they can at
> least emulator test, and SuSE have at least offered this. Google may
> or may not run testing through emulators and tests need to handle it,
> we have and do carry patches for this;
> 
> 4) broken cross-platform support, such as with sample parsing where
> decoding a sample is missing pieces when not done on x86 (come on,
> this is terrible and we don't want more of it);
> 
> 5) the tests do and should make use of common library code for things
> like checking the perf event permissions, this code lives in
> tools/perf/tests/shell/lib but the arch dependencies go where? We
> shouldn't reinvent the shell testing environment per architecture.
> 
> 
> I get that vendors would quite like things not to work on other
> architectures, or for event parsing [1], testing, etc. to be in some
> way crippled and require architecture specific reinvention. Why should
> their engineers invest in infrastructure and testing that could
> benefit someone else? Heck sometimes it feels like vendors may be
> doing everything in their power via broken PMU names, event names,
> etc. to make common infrastructure break when not run in their arch
> specific way [2]? I see things differently and indeed when things go
> wrong vendors argue that the generic code/behaviors needs to be
> changed (I'm thinking of Mark Rutland on Apple PMU support and the
> generic behavior changes made at that time that are now not carried
> through leading to event parsing and display inconsistencies as well
> as RISC-V needing to adopt a different event lookup strategy).
> 
> I want the perf tool to have its behavior be generic. The kernel has
> an arch directory for understandable reasons, for the perf tool the
> key thing is the PMU and we've been building out the infrastructure
> for this. Should the perf tool have every event, metric or syscall
> number for every architecture? For cross-platform work it doesn't seem
> like a completely terrible idea. I'm sensitive to binary size [3] so
> I'm not pushing on it.
> 
> Anyway, this change is unnecessary as the current skipping behavior
> works and is better for the reasons enumerated above. This change gets
> a very large nack from me. I'm supportive of more IBS coverage in the
> rest of the series, but this change isn't required for that.

Thanks for the review.  I understand your concern and it'd be nice to
increase the cross-platform coverage.  I don't know if AMD IBS is
available in an emulator but it should be fine to have this test in the
generic place.

Thanks,
Namhyung

> 
> [1] https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> [2] https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/drivers/perf/arm_cspmu/arm_cspmu.c?h=perf-tools-next#n150
> - please call it cpu_cycles to avoid ambiguity with the legacy event.
> [3] https://lore.kernel.org/lkml/20250417230740.86048-1-irogers@google.com/
> 
> 
> 
> > Thanks
> > James
> >
> > > benefit from more testing on generic pieces of code on all
> > > architectures - like sample parsing. We can always strcmp the PMU name
> > > or the architecture at runtime.
> > >
> > > Structure wise we could have:
> > > tools/perf/pmu/ibs_op/tests/
> > > tools/perf/pmu/ibs_op/tests/shell
> > >
> > > It feels noisy compared to just having the shell test in
> > > tools/perf/tests/shell skip when the PMU isn't present. There are also
> > > things like library dependencies that aren't clear when we have >1
> > > directory. I'd prefer if new testing followed the existing model
> > > rather than this.
> > >
> > > Thanks,
> > > Ian
> > >