[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fU_NCohskhq=rh1wLPxWmSuZcTw-hzwOhBSPBn4cfg4MA@mail.gmail.com>
Date: Tue, 7 Jan 2025 08:46:02 -0800
From: Ian Rogers <irogers@...gle.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, "Liang, Kan" <kan.liang@...ux.intel.com>,
Leo Yan <leo.yan@....com>, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: Re: [PATCH] libperf: Add back guard on MAX_NR_CPUS
On Tue, Jan 7, 2025 at 12:50 AM Christophe Leroy
<christophe.leroy@...roup.eu> wrote:
>
> Le 06/01/2025 à 21:05, Ian Rogers a écrit :
> > On Mon, Jan 6, 2025 at 11:38 AM Christophe Leroy
> > <christophe.leroy@...roup.eu> wrote:
> >>
> >> Building perf with EXTRA_CFLAGS="-DMAX_NR_CPUS=1" fails:
> >>
> >> CC /home/chleroy/linux-powerpc/tools/perf/libperf/cpumap.o
> >> cpumap.c:16: error: "MAX_NR_CPUS" redefined [-Werror]
> >> 16 | #define MAX_NR_CPUS 4096
> >> |
> >> <command-line>: note: this is the location of the previous definition
> >>
> >> Commit e8399d34d568 ("libperf cpumap: Hide/reduce scope of MAX_NR_CPUS")
> >> moved definition of MAX_NR_CPUS from lib/perf/include/internal/cpumap.h
> >> to lib/perf/cpumap.c but the guard surrounding that definition got lost
> >> in the move.
> >>
> >> See commit 21b8732eb447 ("perf tools: Allow overriding MAX_NR_CPUS at
> >> compile time") to see why it is needed.
> >>
> >> Note that MAX_NR_CPUS was initialy defined in perf/perf.h and a
> >> redundant definition was added by commit 9c3516d1b850 ("libperf:
> >> Add perf_cpu_map__new()/perf_cpu_map__read() functions").
> >>
> >> A cleaner fix would be to remove that duplicate but for the time
> >> being fix the problem by bringing back the guard for when MAX_NR_CPUS
> >> is already defined.
> >>
> >> Fixes: e8399d34d568 ("libperf cpumap: Hide/reduce scope of MAX_NR_CPUS")
> >> Signed-off-by: Christophe Leroy <christophe.leroy@...roup.eu>
> >> Reviewed-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> >
> > Hello,
> >
> > I believe this change might be unnecessary. The only use of
> > MAX_NR_CPUS is in a warning message within perf_cpu_map__new, which
> > takes a string and produces a perf_cpu_map. Other similar functions
> > like cpu_map__new_sysconf don't check MAX_NR_CPUS. Therefore,
> > specifying a -DMAX_NR_CPUS value on the build command line has little
> > effect—it only impacts a warning message for certain kinds of
> > perf_cpu_map creation. It's also unclear what the intended outcome is
> > on the build command line.
> >
> > Given that specifying the value doesn't seem to have a clear purpose,
> > allowing the build to break might be the best option. This would alert
> > the person building perf that they are doing something that doesn't
> > make sense.
> >
>
> Ok so I looked at it once more and indeed it looks like it has changed
> since 2017. See commit 21b8732eb447 ("perf tools: Allow overriding
> MAX_NR_CPUS at compile time") to understand why it was required at that
> time.
>
> Now I don't have much size difference anymore between a build with
> MAX_NR_CPUS=1 and the default MAX_NR_CPUS=4096:
>
> $ size perf perf-1cpu
> text data bss dec hex filename
> 3415908 104164 17148 3537220 35f944 perf
> 3415904 104164 16124 3536192 35f540 perf-1cpu
>
> Apparently that was changed by commit 6a1e2c5c2673 ("perf stat: Remove a
> set of shadow stats static variables")
>
> So I agree with you, it is apparently not worth reducing MAX_NR_CPUS
> anymore, I'll give it a try.
Thanks for figuring this out and the explanation. Fwiw, the shadow
stats should be gone. Perf obviously reads counters, as there may be
counters on multiple CPUs then it aggregates these counts together.
The shadow stats were another aggregation mechanism used for metrics
and being a simple person I managed to land just having the single
aggregation approach although there's some cleanup in this area hiding
as tech debt in perf script. The support was removed around 12 minor
versions ago, iirc.
As a heads up given the interest in binary size, most of perf's binary
size comes from the json events. Off the top of my head I'd say it is
like 70% on x86. We avoid string relocations on those by storing
constant offsets and at some point I'd like to get compression going -
although I'd like to compress across files, something 7-zip's format
at least attempts but there was recent 7-zip security news which may
mean a dependency is a bad thing. Anyway, most of the json events
likely aren't needed by the platform you are building for and I added
some build support for only including the ones you are. As you may not
know your CPU's model name I added a mechanism to use CPUIDs:
https://lore.kernel.org/r/20240904044351.712080-1-irogers@google.com
But perhaps we can use x86-64 version levels to make this a more
automated approach, if we see say a "-march=x86-64-v4" in the CFLAGS -
the idea being that say nehalem doesn't support x86-64-v4 so we can
safely drop the nehalem json data. That said x86-64 version levels
have been described as, "some crazy glibc artifact and is stupid and
needs to die":
https://lore.kernel.org/lkml/CAHk-=wh_b8b1qZF8_obMKpF+xfYnPZ6t38F1+5pK-eXNyCdJ7g@mail.gmail.com/
For metrics I have a similar problem, knowing which models support
which events, but to handle that I use the json and see if it has an
event, if not the metric isn't created. This causes issues when new
models rename events. Anyway, those patches haven't landed, here are
the Intel versions:
https://lore.kernel.org/lkml/20240926175035.408668-1-irogers@google.com/
Thanks,
Ian
Powered by blists - more mailing lists