lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fU_NCohskhq=rh1wLPxWmSuZcTw-hzwOhBSPBn4cfg4MA@mail.gmail.com>
Date: Tue, 7 Jan 2025 08:46:02 -0800
From: Ian Rogers <irogers@...gle.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, "Liang, Kan" <kan.liang@...ux.intel.com>, 
	Leo Yan <leo.yan@....com>, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org, 
	Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: Re: [PATCH] libperf: Add back guard on MAX_NR_CPUS

On Tue, Jan 7, 2025 at 12:50 AM Christophe Leroy
<christophe.leroy@...roup.eu> wrote:
>
> Le 06/01/2025 à 21:05, Ian Rogers a écrit :
> > On Mon, Jan 6, 2025 at 11:38 AM Christophe Leroy
> > <christophe.leroy@...roup.eu> wrote:
> >>
> >> Building perf with EXTRA_CFLAGS="-DMAX_NR_CPUS=1" fails:
> >>
> >>            CC      /home/chleroy/linux-powerpc/tools/perf/libperf/cpumap.o
> >>          cpumap.c:16: error: "MAX_NR_CPUS" redefined [-Werror]
> >>             16 | #define MAX_NR_CPUS 4096
> >>                |
> >>          <command-line>: note: this is the location of the previous definition
> >>
> >> Commit e8399d34d568 ("libperf cpumap: Hide/reduce scope of MAX_NR_CPUS")
> >> moved definition of MAX_NR_CPUS from lib/perf/include/internal/cpumap.h
> >> to lib/perf/cpumap.c but the guard surrounding that definition got lost
> >> in the move.
> >>
> >> See commit 21b8732eb447 ("perf tools: Allow overriding MAX_NR_CPUS at
> >> compile time") to see why it is needed.
> >>
> >> Note that MAX_NR_CPUS was initialy defined in perf/perf.h and a
> >> redundant definition was added by commit 9c3516d1b850 ("libperf:
> >> Add perf_cpu_map__new()/perf_cpu_map__read() functions").
> >>
> >> A cleaner fix would be to remove that duplicate but for the time
> >> being fix the problem by bringing back the guard for when MAX_NR_CPUS
> >> is already defined.
> >>
> >> Fixes: e8399d34d568 ("libperf cpumap: Hide/reduce scope of MAX_NR_CPUS")
> >> Signed-off-by: Christophe Leroy <christophe.leroy@...roup.eu>
> >> Reviewed-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> >
> > Hello,
> >
> > I believe this change might be unnecessary. The only use of
> > MAX_NR_CPUS is in a warning message within perf_cpu_map__new, which
> > takes a string and produces a perf_cpu_map. Other similar functions
> > like cpu_map__new_sysconf don't check MAX_NR_CPUS. Therefore,
> > specifying a -DMAX_NR_CPUS value on the build command line has little
> > effect—it only impacts a warning message for certain kinds of
> > perf_cpu_map creation. It's also unclear what the intended outcome is
> > on the build command line.
> >
> > Given that specifying the value doesn't seem to have a clear purpose,
> > allowing the build to break might be the best option. This would alert
> > the person building perf that they are doing something that doesn't
> > make sense.
> >
>
> Ok so I looked at it once more and indeed it looks like it has changed
> since 2017. See commit 21b8732eb447 ("perf tools: Allow overriding
> MAX_NR_CPUS at compile time") to understand why it was required at that
> time.
>
> Now I don't have much size difference anymore between a build with
> MAX_NR_CPUS=1 and the default MAX_NR_CPUS=4096:
>
> $ size perf perf-1cpu
>     text           data     bss     dec     hex filename
> 3415908  104164   17148 3537220  35f944 perf
> 3415904  104164   16124 3536192  35f540 perf-1cpu
>
> Apparently that was changed by commit 6a1e2c5c2673 ("perf stat: Remove a
> set of shadow stats static variables")
>
> So I agree with you, it is apparently not worth reducing MAX_NR_CPUS
> anymore, I'll give it a try.

Thanks for figuring this out and the explanation. Fwiw, the shadow
stats should be gone. Perf obviously reads counters, as there may be
counters on multiple CPUs then it aggregates these counts together.
The shadow stats were another aggregation mechanism used for metrics
and being a simple person I managed to land just having the single
aggregation approach although there's some cleanup in this area hiding
as tech debt in perf script. The support was removed around 12 minor
versions ago, iirc.

As a heads up given the interest in binary size, most of perf's binary
size comes from the json events. Off the top of my head I'd say it is
like 70% on x86. We avoid string relocations on those by storing
constant offsets and at some point I'd like to get compression going -
although I'd like to compress across files, something 7-zip's format
at least attempts but there was recent 7-zip security news which may
mean a dependency is a bad thing. Anyway, most of the json events
likely aren't needed by the platform you are building for and I added
some build support for only including the ones you are. As you may not
know your CPU's model name I added a mechanism to use CPUIDs:
https://lore.kernel.org/r/20240904044351.712080-1-irogers@google.com
But perhaps we can use x86-64 version levels to make this a more
automated approach, if we see say a "-march=x86-64-v4" in the CFLAGS -
the idea being that say nehalem doesn't support x86-64-v4 so we can
safely drop the nehalem json data. That said x86-64 version levels
have been described as, "some crazy glibc artifact and is stupid and
needs to die":
https://lore.kernel.org/lkml/CAHk-=wh_b8b1qZF8_obMKpF+xfYnPZ6t38F1+5pK-eXNyCdJ7g@mail.gmail.com/
For metrics I have a similar problem, knowing which models support
which events, but to handle that I use the json and see if it has an
event, if not the metric isn't created. This causes issues when new
models rename events. Anyway, those patches haven't landed, here are
the Intel versions:
https://lore.kernel.org/lkml/20240926175035.408668-1-irogers@google.com/

Thanks,
Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ