lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWeyorotfVz_y16ibakSwbNa0fapZoxSZ1nbkt1s=uGbw@mail.gmail.com>
Date:   Tue, 21 Nov 2023 07:41:17 -0800
From:   Ian Rogers <irogers@...gle.com>
To:     Marc Zyngier <maz@...nel.org>
Cc:     Mark Rutland <mark.rutland@....com>,
        Hector Martin <marcan@...can.st>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        James Clark <james.clark@....com>,
        linux-perf-users@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        Asahi Linux <asahi@...ts.linux.dev>
Subject: Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5

On Tue, Nov 21, 2023 at 7:24 AM Marc Zyngier <maz@...nel.org> wrote:
>
> On Tue, 21 Nov 2023 13:40:31 +0000,
> Marc Zyngier <maz@...nel.org> wrote:
> >
> > [Adding key people on Cc]
> >
> > On Tue, 21 Nov 2023 12:08:48 +0000,
> > Hector Martin <marcan@...can.st> wrote:
> > >
> > > Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >
> > I can confirm that at least on 6.7-rc2, perf is pretty busted on any
> > asymmetric ARM platform. It isn't clear what criteria is used to pick
> > the PMU, but nothing works anymore.
> >
> > The saving grace in my case is that Debian still ships a 6.1 perftool
> > package, but that's obviously not going to last.
> >
> > I'm happy to test potential fixes.
>
> At Mark's request, I've dumped a couple of perf (as of -rc2) runs with
> -vvv.  And it is quite entertaining (this is taskset to an 'icestorm'
> CPU):
>
> <quote>
> maz@...ley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 0 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045843  cpu -1  group_fd -1  flags 0x8 = 5
> arch                    builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> bench                   builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> Build                   builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> builtin-annotate.c      builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> builtin-annotate.o      builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> builtin-bench.c         builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> builtin-bench.o         builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c     jvmti            perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c    libapi   PERF-VERSION-FILE
> builtin-buildid-list.c  builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> builtin-buildid-list.o  builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> builtin-c2c.c           builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> builtin-c2c.o           builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> builtin-config.c        builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> builtin-config.o        builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> builtin-daemon.c        builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> builtin-daemon.o        builtin-list.c      builtin-version.c    perf             ui
> builtin-data.c          builtin-list.o      builtin-version.o    perf-archive     util
> builtin-data.o          builtin-lock.c      check-headers.sh     perf-archive.sh
> builtin-diff.c          builtin-mem.c       command-list.txt     perf.c
> apple_icestorm_pmu/cycles/: -1: 0 873709 0
> apple_firestorm_pmu/cycles/: -1: 0 873709 0
> cycles: -1: 0 873709 0
> apple_icestorm_pmu/cycles/: 0 873709 0
> apple_firestorm_pmu/cycles/: 0 873709 0
> cycles: 0 873709 0
>
>  Performance counter stats for 'ls':
>
>      <not counted>      apple_icestorm_pmu/cycles/                                              (0.00%)
>      <not counted>      apple_firestorm_pmu/cycles/                                             (0.00%)
>      <not counted>      cycles                                                                  (0.00%)
>
>        0.000002250 seconds time elapsed
>
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>
>
> If I run the same thing on another CPU cluster (firestorm), I get
> this:
>
> <quote>
> maz@...ley-girl:~/hot-poop/arm-platforms/tools/perf$ sudo taskset -c 2 ./perf stat -vvv -e apple_icestorm_pmu/cycles/ -e
>  apple_firestorm_pmu/cycles/ -e cycles ls
> Using CPUID 0x00000000612f0280
> Attempt to add: apple_icestorm_pmu/cycles=0/
> ..after resolving event: apple_icestorm_pmu/cycles=0/
> Opening: unknown-hardware:HG
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   config                           0xb00000000
>   disabled                         1
> ------------------------------------------------------------
> sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
> sys_perf_event_open failed, error -95
> Attempt to add: apple_firestorm_pmu/cycles=0/
> ..after resolving event: apple_firestorm_pmu/cycles=0/
> Control descriptor is not initialized
> Opening: apple_icestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 3
> Opening: apple_firestorm_pmu/cycles/
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 4
> Opening: cycles
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 1045925  cpu -1  group_fd -1  flags 0x8 = 5
> arch                    builtin-diff.o      builtin-mem.o        common-cmds.h    perf-completion.sh
> bench                   builtin-evlist.c    builtin-probe.c      CREDITS          perf.h
> Build                   builtin-evlist.o    builtin-probe.o      design.txt       perf-in.o
> builtin-annotate.c      builtin-ftrace.c    builtin-record.c     dlfilters        perf-iostat
> builtin-annotate.o      builtin-ftrace.o    builtin-record.o     Documentation    perf-iostat.sh
> builtin-bench.c         builtin.h           builtin-report.c     FEATURE-DUMP     perf.o
> builtin-bench.o         builtin-help.c      builtin-report.o     include          perf-read-vdso.c
> builtin-buildid-cache.c  builtin-help.o      builtin-sched.c     jvmti            perf-sys.h
> builtin-buildid-cache.o  builtin-inject.c    builtin-script.c    libapi   PERF-VERSION-FILE
> builtin-buildid-list.c  builtin-inject.o    builtin-script.o     libperf          perf-with-kcore
> builtin-buildid-list.o  builtin-kallsyms.c  builtin-stat.c       libsubcmd        pmu-events
> builtin-c2c.c           builtin-kallsyms.o  builtin-stat.o       libsymbol        python
> builtin-c2c.o           builtin-kmem.c      builtin-timechart.c  Makefile         python_ext_build
> builtin-config.c        builtin-kvm.c       builtin-top.c        Makefile.config  scripts
> builtin-config.o        builtin-kvm.o       builtin-top.o        Makefile.perf    tests
> builtin-daemon.c        builtin-kwork.c     builtin-trace.c      MANIFEST         trace
> builtin-daemon.o        builtin-list.c      builtin-version.c    perf             ui
> builtin-data.c          builtin-list.o      builtin-version.o    perf-archive     util
> builtin-data.o          builtin-lock.c      check-headers.sh     perf-archive.sh
> builtin-diff.c          builtin-mem.c       command-list.txt     perf.c
> apple_icestorm_pmu/cycles/: -1: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: -1: 1035035 469125 469125
> cycles: -1: 1034653 469125 469125
> apple_icestorm_pmu/cycles/: 1035101 469125 469125
> apple_firestorm_pmu/cycles/: 1035035 469125 469125
> cycles: 1034653 469125 469125
>
>  Performance counter stats for 'ls':
>
>          1,035,101      apple_icestorm_pmu/cycles/
>          1,035,035      apple_firestorm_pmu/cycles/
>          1,034,653      cycles
>
>        0.000001333 seconds time elapsed
>
>        0.000000000 seconds user
>        0.000000000 seconds sys
> </quote>
>
> which doesn't make any sense either. I really don't understand what
> this PERF_TYPE_HARDWARE does here (the *real* types are 10 and 11),
> nor what this 'cycle=0' stuff is.

Hi Marc,

I'm unclear if you are running a newer perf tool on an older kernel or
not. In any case I'll assume the kernel and perf tool versions match.
In Linux 6.6 this patch was added to the ARM PMU:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/perf/arm_pmu.c?id=5c816728651ae425954542fed64d21d40cb75a9f

My guess is that the apple_icestorm_pmu requires a similar patch. The
perf tool is supposed to not use extended types when they aren't
supported:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n532
So I share your confusion as to why something broke.

PERF_TYPE_HARDWARE is a legacy type where there are hardcoded type and
config values that correspond to an event. The PMU driver turns legacy
events into the real types. On BIG.little systems if the legacy events
are monitoring a task a different event is needed for each PMU (ie >1
event). In your example you are monitoring 'ls', a task, and so
different cycles events are necessary. In the high 32-bits (the
extended type) the PMU is identified.

Thanks for reporting the issue,
Ian

> /puzzled
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ