lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZVzPUjOiH6zpUlz5@FVFF77S0Q05N.cambridge.arm.com>
Date:   Tue, 21 Nov 2023 15:40:02 +0000
From:   Mark Rutland <mark.rutland@....com>
To:     Marc Zyngier <maz@...nel.org>
Cc:     Hector Martin <marcan@...can.st>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Ian Rogers <irogers@...gle.com>,
        James Clark <james.clark@....com>,
        linux-perf-users@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        Asahi Linux <asahi@...ts.linux.dev>
Subject: Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since
 v6.5

On Tue, Nov 21, 2023 at 03:24:25PM +0000, Marc Zyngier wrote:
> On Tue, 21 Nov 2023 13:40:31 +0000,
> Marc Zyngier <maz@...nel.org> wrote:
> > 
> > [Adding key people on Cc]
> > 
> > On Tue, 21 Nov 2023 12:08:48 +0000,
> > Hector Martin <marcan@...can.st> wrote:
> > > 
> > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> > 
> > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > the PMU, but nothing works anymore.
> > 
> > The saving grace in my case is that Debian still ships a 6.1 perftool
> > package, but that's obviously not going to last.
> > 
> > I'm happy to test potential fixes.
> 
> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> CPU):

IIUC the tool is doing the wrong thing here and overriding explicit
${pmu}/${event}/ events with PERF_TYPE_HARDWARE events rather than events using
that ${pmu}'s type and event namespace.

Regardless of the *new* ABI that allows PERF_TYPE_HARDWARE events to be
targetted to a specific PMU, it's semantically wrong to rewrite events like
this since ${pmu}/${event}/ is not necessarily equivalent to a similarly-named
PERF_COUNT_HW_${EVENT}. 

Mark.

> <quote>
> maz@...ley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> arch			builtin-diff.o      builtin-mem.o	 common-cmds.h    perf-completion.sh
> bench			builtin-evlist.c    builtin-probe.c	 CREDITS	  perf.h
> Build			builtin-evlist.o    builtin-probe.o	 design.txt	  perf-in.o
> builtin-annotate.c	builtin-ftrace.c    builtin-record.c	 dlfilters	  perf-iostat
> builtin-annotate.o	builtin-ftrace.o    builtin-record.o	 Documentation    perf-iostat.sh
> builtin-bench.c		builtin.h	    builtin-report.c	 FEATURE-DUMP	  perf.o
> builtin-bench.o		builtin-help.c      builtin-report.o	 include	  perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c	 jvmti		  perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c	 libapi	  PERF-VERSION-FILE
> builtin-buildid-list.c	builtin-inject.o    builtin-script.o	 libperf	  perf-with-kcore
> builtin-buildid-list.o	builtin-kallsyms.c  builtin-stat.c	 libsubcmd	  pmu-events
> builtin-c2c.c		builtin-kallsyms.o  builtin-stat.o	 libsymbol	  python
> builtin-c2c.o		builtin-kmem.c      builtin-timechart.c  Makefile	  python_ext_build
> builtin-config.c	builtin-kvm.c	    builtin-top.c	 Makefile.config  scripts
> builtin-config.o	builtin-kvm.o	    builtin-top.o	 Makefile.perf    tests
> builtin-daemon.c	builtin-kwork.c     builtin-trace.c	 MANIFEST	  trace
> builtin-daemon.o	builtin-list.c      builtin-version.c	 perf		  ui
> builtin-data.c		builtin-list.o      builtin-version.o	 perf-archive	  util
> builtin-data.o		builtin-lock.c      check-headers.sh	 perf-archive.sh
> builtin-diff.c		builtin-mem.c	    command-list.txt	 perf.c
> apple_icestorm_pmu/cycles/: -1: 0 873709 0
> apple_firestorm_pmu/cycles/: -1: 0 873709 0
> cycles: -1: 0 873709 0
> apple_icestorm_pmu/cycles/: 0 873709 0
> apple_firestorm_pmu/cycles/: 0 873709 0
> cycles: 0 873709 0
> 
>  Performance counter stats for 'ls':
> 
>      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
>      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
>      <not counted>      cycles                                                                  (0.00%)
> 
>        0.000002250 seconds time elapsed
> 
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>
> 
> If I run the same thing on another CPU cluster (firestorm), I get
> this:
> 
> <quote>
> maz@...ley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> arch			builtin-diff.o      builtin-mem.o	 common-cmds.h    perf-completion.sh
> bench			builtin-evlist.c    builtin-probe.c	 CREDITS	  perf.h
> Build			builtin-evlist.o    builtin-probe.o	 design.txt	  perf-in.o
> builtin-annotate.c	builtin-ftrace.c    builtin-record.c	 dlfilters	  perf-iostat
> builtin-annotate.o	builtin-ftrace.o    builtin-record.o	 Documentation    perf-iostat.sh
> builtin-bench.c		builtin.h	    builtin-report.c	 FEATURE-DUMP	  perf.o
> builtin-bench.o		builtin-help.c      builtin-report.o	 include	  perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c	 jvmti		  perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c	 libapi	  PERF-VERSION-FILE
> builtin-buildid-list.c	builtin-inject.o    builtin-script.o	 libperf	  perf-with-kcore
> builtin-buildid-list.o	builtin-kallsyms.c  builtin-stat.c	 libsubcmd	  pmu-events
> builtin-c2c.c		builtin-kallsyms.o  builtin-stat.o	 libsymbol	  python
> builtin-c2c.o		builtin-kmem.c      builtin-timechart.c  Makefile	  python_ext_build
> builtin-config.c	builtin-kvm.c	    builtin-top.c	 Makefile.config  scripts
> builtin-config.o	builtin-kvm.o	    builtin-top.o	 Makefile.perf    tests
> builtin-daemon.c	builtin-kwork.c     builtin-trace.c	 MANIFEST	  trace
> builtin-daemon.o	builtin-list.c      builtin-version.c	 perf		  ui
> builtin-data.c		builtin-list.o      builtin-version.o	 perf-archive	  util
> builtin-data.o		builtin-lock.c      check-headers.sh	 perf-archive.sh
> builtin-diff.c		builtin-mem.c	    command-list.txt	 perf.c
> apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> cycles: -1: 1034653 469125 469125
> apple_icestorm_pmu/cycles/: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: 1035035 469125 469125
> cycles: 1034653 469125 469125
> 
>  Performance counter stats for 'ls':
> 
>          1,035,101      apple_icestorm_pmu/cycles/                                            
>          1,035,035      apple_firestorm_pmu/cycles/                                           
>          1,034,653      cycles                                                                
> 
>        0.000001333 seconds time elapsed
> 
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>
> 
> which doesn't make any sense either. I really don't understand what
> this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
> nor what this 'cycle=0' stuff is.
> 
> /puzzled
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ