[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fXFoawZnRD22iSev5FQnx3oyFOhrPf=gZbk84qGtr9NFA@mail.gmail.com>
Date: Mon, 3 Mar 2025 23:04:59 -0800
From: Ian Rogers <irogers@...gle.com>
To: Ian Rogers <irogers@...gle.com>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>,
John Garry <john.g.garry@...cle.com>, Will Deacon <will@...nel.org>,
James Clark <james.clark@...aro.org>, Mike Leach <mike.leach@...aro.org>,
Leo Yan <leo.yan@...ux.dev>, guoren <guoren@...nel.org>,
Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>, Charlie Jenkins <charlie@...osinc.com>,
Bibo Mao <maobibo@...ngson.cn>, Huacai Chen <chenhuacai@...nel.org>,
Catalin Marinas <catalin.marinas@....com>, Jiri Slaby <jirislaby@...nel.org>,
Björn Töpel <bjorn@...osinc.com>,
Howard Chu <howardchu95@...il.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
"linux-csky@...r.kernel.org" <linux-csky@...r.kernel.org>, linux-riscv@...ts.infradead.org,
Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH v4 00/11] perf: Support multiple system call tables in the build
On Mon, Mar 3, 2025 at 9:04 PM Ian Rogers <irogers@...gle.com> wrote:
>
> This work builds on the clean up of system call tables and removal of
> libaudit by Charlie Jenkins <charlie@...osinc.com>.
>
> The system call table in perf trace is used to map system call numbers
> to names and vice versa. Prior to these changes, a single table
> matching the perf binary's build was present. The table would be
> incorrect if tracing say a 32-bit binary from a 64-bit version of
> perf, the names and numbers wouldn't match.
>
> Change the build so that a single system call file is built and the
> potentially multiple tables are identifiable from the ELF machine type
> of the process being examined. To determine the ELF machine type, the
> executable's maps are searched and the associated DSOs ELF headers are
> read. When this fails and when live, /proc/pid/exe's ELF header is
> read. Fallback to using the perf's binary type when unknown.
>
> Remove some runtime types used by the system call tables and make
> equivalents generated at build time.
>
> v4: Add reading the e_machine from the thread's maps dsos, only read
> from /proc/pid/exe on failure and when live as requested by
> Namhyung. Add patches to add dso comments and remove unused
> dso_data variables that are unused without libunwind.
This has allowed `perf trace record` (not just perf trace) to work
with binaries with an e_machine that doesn't match that of the perf
binary. An example:
Before:
```
$ file ./a.out
a.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux.so.2
, BuildID[sha1]=3fcd28f85a27a3108941661a91dbc675c06868f9, for
GNU/Linux 3.2.0, not stripped
$ perf trace record -- ./a.out
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.059 MB perf.data (60 samples) ]
$ perf trace -i perf.data
? ( ): a.out/914959 ... [continued]: munmap())
= 0
0.019 ( 0.001 ms): a.out/914959 recvfrom(ubuf: 0x2, size:
4160602092, flags: DONTROUTE|CTRUNC|TRUNC|DONTWAIT|EOR|>
0.034 ( 0.002 ms): a.out/914959 lgetxattr(name: 0x2000, value:
0x3, size: 34) = 4160352256
0.043 ( 0.002 ms): a.out/914959 dup2(oldfd: -134405940, newfd: 4)
= -1 ENOENT>
0.047 ( 0.009 ms): a.out/914959 preadv(fd: 4294967196, vec:
0xf7fce47f, vlen: 557056, pos_h: 4160602092) = 3
0.058 ( 0.004 ms): a.out/914959 lgetxattr(name: 0x1b5c2, value:
0x1, size: 2) = 4160237568
0.063 ( 0.000 ms): a.out/914959 lstat(filename: 0x3, statbuf:
0x1b5c2) = 0
0.071 ( 0.006 ms): a.out/914959 preadv(fd: 4294967196, vec:
0xf7f9f3e0, vlen: 557056, pos_h: 4160602092) = 3
0.078 ( 0.001 ms): a.out/914959 close(fd: 3)
= 512
0.082 ( 0.002 ms): a.out/914959 lgetxattr(name: 0x23f8d0, value:
0x1, size: 2050) = 4157878272
0.084 ( 0.006 ms): a.out/914959 lgetxattr(pathname: 0xf7d66000,
name: 0x18b000, value: 0x5, size: 2066) = 4158021>
0.091 ( 0.002 ms): a.out/914959 lgetxattr(pathname: 0xf7ef1000,
name: 0x85000, value: 0x1, size: 2066) = 41596395>
0.093 ( 0.003 ms): a.out/914959 lgetxattr(pathname: 0xf7f76000,
name: 0x3000, value: 0x3, size: 2066) = 4160184320
0.099 ( 0.002 ms): a.out/914959 lgetxattr(pathname: 0xf7f79000,
name: 0x98d0, value: 0x3, size: 50) = 4160196608
0.106 ( 0.000 ms): a.out/914959 lstat(filename: 0x3)
= 0
0.112 ( 0.001 ms): a.out/914959 mq_timedreceive(mqdes: 4287979520,
u_msg_ptr: 0xf7f9fbb0, u_msg_prio: 0xf7fdbfec,>
0.113 ( 0.000 ms): a.out/914959 mkdirat(dfd: -134609624, pathname:
0xf7fdc910, mode: IFSOCK|ISUID|IRUSR|IWGRP|0xf>
0.114 ( 0.000 ms): a.out/914959 process_vm_writev(pid: -134609620,
lvec: 0xc, liovcnt: 4160604432, rvec: 0xf7fa04>
0.154 ( 0.003 ms): a.out/914959 capget(header: 4160184320,
dataptr: 8192) = 0
0.158 ( 0.002 ms): a.out/914959 capget(header: 1448792064,
dataptr: 4096) = 0
0.163 ( 0.002 ms): a.out/914959 capget(header: 4160593920,
dataptr: 8192) = 0
0.171 ( 0.001 ms): a.out/914959 getxattr(pathname: 0x3, name:
0xff955fe4, value: 0xf7f77e14, size: 1) = 0
0.179 ( 0.005 ms): a.out/914959 fchmod(fd: -134729728, mode:
IFLNK|IFIFO|ISGID|IRWXU|IWOTH|0x10000) = 0
0.193 ( 0.008 ms): a.out/914959 preadv(fd: 4294967196, vec:
0x565ac008, pos_h: 4160192020) = 3
0.202 ( 0.007 ms): a.out/914959 close(fd: 3)
= 1436
0.209 ( 0.017 ms): a.out/914959 stat(filename: 0x1, statbuf:
0xff9552fc) = 1436
0.234 (1000.083 ms): a.out/914959 readlinkat(buf: 0xff955224,
bufsiz: 4287975964) = 0
```
After:
```
$ file ./a.out
a.out: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux.so.2
, BuildID[sha1]=3fcd28f85a27a3108941661a91dbc675c06868f9, for
GNU/Linux 3.2.0, not stripped
$ perf trace record -- ./a.out
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.059 MB perf.data (60 samples) ]
$ perf trace -i perf.data
? ( ): a.out/908002 ... [continued]: execve())
= 0
0.019 ( 0.001 ms): a.out/908002 brk()
= 0x57680000
0.041 ( 0.003 ms): a.out/908002 access(filename: 0xf7f0b0cc, mode:
R) = -1 ENOENT>
0.046 ( 0.008 ms): a.out/908002 openat(dfd: CWD, filename:
0xf7f0747f, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
0.055 ( 0.001 ms): a.out/908002 statx(dfd: 3, filename:
0xf7f080f6, flags: NO_AUTOMOUNT|EMPTY_PATH, mask: TYPE|MO>
0.061 ( 0.000 ms): a.out/908002 close(fd: 3)
= 0
0.070 ( 0.006 ms): a.out/908002 openat(dfd: CWD, filename:
0xf7ed83e0, flags: RDONLY|CLOEXEC|LARGEFILE) = 3
0.077 ( 0.001 ms): a.out/908002 read(fd: 3, buf: 0xff80ea50,
count: 512) = 512
0.079 ( 0.001 ms): a.out/908002 statx(dfd: 3, filename:
0xf7f080f6, flags: NO_AUTOMOUNT|EMPTY_PATH, mask: TYPE|MO>
0.104 ( 0.000 ms): a.out/908002 close(fd: 3)
= 0
0.112 ( 0.000 ms): a.out/908002 set_tid_address(tidptr:
0xf7ed9528) = 908002 (a>
0.113 ( 0.000 ms): a.out/908002 set_robust_list(head: 0xf7ed952c,
len: 12) = 0 (swappe>
0.114 ( 0.001 ms): a.out/908002 rseq(rseq: 0xf7ed9960, rseq_len:
32, sig: 1392848979) = 0 (swappe>
0.153 ( 0.003 ms): a.out/908002 mprotect(start: 0xf7eaf000, len:
8192, prot: READ) = 0
0.158 ( 0.002 ms): a.out/908002 mprotect(start: 0x565ef000, len:
4096, prot: READ) = 0
0.163 ( 0.002 ms): a.out/908002 mprotect(start: 0xf7f13000, len:
8192, prot: READ) = 0
0.177 ( 0.005 ms): a.out/908002 munmap(addr: 0xf7ebc000, len:
112066) = 0
0.189 ( 0.009 ms): a.out/908002 openat(dfd: CWD, filename:
0x565ee008) = 3
0.198 ( 0.006 ms): a.out/908002 read(fd: 3, buf: 0xff80e56c,
count: 4096) = 1436
0.205 ( 0.017 ms): a.out/908002 write(fd: 1, buf: , count: 1436)
= 1436
0.229 (1000.201 ms): a.out/908002 clock_nanosleep(rqtp:
0xff80e494, rmtp: 0xff80e48c) = 0
1000.486 ( ): a.out/908002 exit_group()
```
Thanks,
Ian
> v3: Add Charlie's reviewed-by tags. Incorporate feedback from Arnd
> Bergmann <arnd@...db.de> on additional optional column and MIPS
> system call numbering. Rebase past Namhyung's global system call
> statistics and add comments that they don't yet support an
> e_machine other than EM_HOST.
>
> v2: Change the 1 element cache for the last table as suggested by
> Howard Chu, add Howard's reviewed-by tags.
> Add a comment and apology to Charlie for not doing better in
> guiding:
> https://lore.kernel.org/all/20250114-perf_syscall_arch_runtime-v1-1-5b304e408e11@rivosinc.com/
> After discussion on v1 and he agreed this patch series would be
> the better direction.
>
> Ian Rogers (11):
> perf dso: Move libunwind dso_data variables into ifdef
> perf dso: kernel-doc for enum dso_binary_type
> perf syscalltbl: Remove syscall_table.h
> perf trace: Reorganize syscalls
> perf syscalltbl: Remove struct syscalltbl
> perf dso: Add support for reading the e_machine type for a dso
> perf thread: Add support for reading the e_machine type for a thread
> perf trace beauty: Add syscalltbl.sh generating all system call tables
> perf syscalltbl: Use lookup table containing multiple architectures
> perf build: Remove Makefile.syscalls
> perf syscalltbl: Mask off ABI type for MIPS system calls
>
> tools/perf/Makefile.perf | 10 +-
> tools/perf/arch/alpha/entry/syscalls/Kbuild | 2 -
> .../alpha/entry/syscalls/Makefile.syscalls | 5 -
> tools/perf/arch/alpha/include/syscall_table.h | 2 -
> tools/perf/arch/arc/entry/syscalls/Kbuild | 2 -
> .../arch/arc/entry/syscalls/Makefile.syscalls | 3 -
> tools/perf/arch/arc/include/syscall_table.h | 2 -
> tools/perf/arch/arm/entry/syscalls/Kbuild | 4 -
> .../arch/arm/entry/syscalls/Makefile.syscalls | 2 -
> tools/perf/arch/arm/include/syscall_table.h | 2 -
> tools/perf/arch/arm64/entry/syscalls/Kbuild | 3 -
> .../arm64/entry/syscalls/Makefile.syscalls | 6 -
> tools/perf/arch/arm64/include/syscall_table.h | 8 -
> tools/perf/arch/csky/entry/syscalls/Kbuild | 2 -
> .../csky/entry/syscalls/Makefile.syscalls | 3 -
> tools/perf/arch/csky/include/syscall_table.h | 2 -
> .../perf/arch/loongarch/entry/syscalls/Kbuild | 2 -
> .../entry/syscalls/Makefile.syscalls | 3 -
> .../arch/loongarch/include/syscall_table.h | 2 -
> tools/perf/arch/mips/entry/syscalls/Kbuild | 2 -
> .../mips/entry/syscalls/Makefile.syscalls | 5 -
> tools/perf/arch/mips/include/syscall_table.h | 2 -
> tools/perf/arch/parisc/entry/syscalls/Kbuild | 3 -
> .../parisc/entry/syscalls/Makefile.syscalls | 6 -
> .../perf/arch/parisc/include/syscall_table.h | 8 -
> tools/perf/arch/powerpc/entry/syscalls/Kbuild | 3 -
> .../powerpc/entry/syscalls/Makefile.syscalls | 6 -
> .../perf/arch/powerpc/include/syscall_table.h | 8 -
> tools/perf/arch/riscv/entry/syscalls/Kbuild | 2 -
> .../riscv/entry/syscalls/Makefile.syscalls | 4 -
> tools/perf/arch/riscv/include/syscall_table.h | 8 -
> tools/perf/arch/s390/entry/syscalls/Kbuild | 2 -
> .../s390/entry/syscalls/Makefile.syscalls | 5 -
> tools/perf/arch/s390/include/syscall_table.h | 2 -
> tools/perf/arch/sh/entry/syscalls/Kbuild | 2 -
> .../arch/sh/entry/syscalls/Makefile.syscalls | 4 -
> tools/perf/arch/sh/include/syscall_table.h | 2 -
> tools/perf/arch/sparc/entry/syscalls/Kbuild | 3 -
> .../sparc/entry/syscalls/Makefile.syscalls | 5 -
> tools/perf/arch/sparc/include/syscall_table.h | 8 -
> tools/perf/arch/x86/entry/syscalls/Kbuild | 3 -
> .../arch/x86/entry/syscalls/Makefile.syscalls | 6 -
> tools/perf/arch/x86/include/syscall_table.h | 8 -
> tools/perf/arch/xtensa/entry/syscalls/Kbuild | 2 -
> .../xtensa/entry/syscalls/Makefile.syscalls | 4 -
> .../perf/arch/xtensa/include/syscall_table.h | 2 -
> tools/perf/builtin-trace.c | 290 +++++++++++-------
> tools/perf/scripts/Makefile.syscalls | 61 ----
> tools/perf/scripts/syscalltbl.sh | 86 ------
> tools/perf/trace/beauty/syscalltbl.sh | 274 +++++++++++++++++
> tools/perf/util/dso.c | 54 ++++
> tools/perf/util/dso.h | 56 ++++
> tools/perf/util/syscalltbl.c | 148 ++++-----
> tools/perf/util/syscalltbl.h | 22 +-
> tools/perf/util/thread.c | 80 +++++
> tools/perf/util/thread.h | 14 +-
> 56 files changed, 756 insertions(+), 509 deletions(-)
> delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/alpha/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/alpha/include/syscall_table.h
> delete mode 100644 tools/perf/arch/arc/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/arc/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/arc/include/syscall_table.h
> delete mode 100644 tools/perf/arch/arm/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/arm/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/arm/include/syscall_table.h
> delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/arm64/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/arm64/include/syscall_table.h
> delete mode 100644 tools/perf/arch/csky/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/csky/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/csky/include/syscall_table.h
> delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/loongarch/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/loongarch/include/syscall_table.h
> delete mode 100644 tools/perf/arch/mips/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/mips/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/mips/include/syscall_table.h
> delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/parisc/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/parisc/include/syscall_table.h
> delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/powerpc/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/powerpc/include/syscall_table.h
> delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/riscv/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/riscv/include/syscall_table.h
> delete mode 100644 tools/perf/arch/s390/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/s390/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/s390/include/syscall_table.h
> delete mode 100644 tools/perf/arch/sh/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/sh/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/sh/include/syscall_table.h
> delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/sparc/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/sparc/include/syscall_table.h
> delete mode 100644 tools/perf/arch/x86/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/x86/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/x86/include/syscall_table.h
> delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Kbuild
> delete mode 100644 tools/perf/arch/xtensa/entry/syscalls/Makefile.syscalls
> delete mode 100644 tools/perf/arch/xtensa/include/syscall_table.h
> delete mode 100644 tools/perf/scripts/Makefile.syscalls
> delete mode 100755 tools/perf/scripts/syscalltbl.sh
> create mode 100755 tools/perf/trace/beauty/syscalltbl.sh
>
> --
> 2.48.1.711.g2feabab25a-goog
>
Powered by blists - more mailing lists