lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <cover.1730150953.git.jpoimboe@kernel.org>
Date: Mon, 28 Oct 2024 14:47:27 -0700
From: Josh Poimboeuf <jpoimboe@...nel.org>
To: x86@...nel.org
Cc: Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...nel.org>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	linux-kernel@...r.kernel.org,
	Indu Bhagat <indu.bhagat@...cle.com>,
	Mark Rutland <mark.rutland@....com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Ian Rogers <irogers@...gle.com>,
	Adrian Hunter <adrian.hunter@...el.com>,
	linux-perf-users@...r.kernel.org,
	Mark Brown <broonie@...nel.org>,
	linux-toolchains@...r.kernel.org,
	Jordan Rome <jordalgo@...a.com>,
	Sam James <sam@...too.org>,
	linux-trace-kernel@...r.kerne.org,
	Andrii Nakryiko <andrii.nakryiko@...il.com>,
	Jens Remus <jremus@...ux.ibm.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Florian Weimer <fweimer@...hat.com>,
	Andy Lutomirski <luto@...nel.org>
Subject: [PATCH v3 00/19] unwind, perf: sframe user space unwinding

This has all the changes discussed in v2, plus VDSO sframe support and
Namhyung's perf tool patches (see detailed changelog below).

I did quite a bit of testing, it seems to work well.  It still needs
some binutils and glibc patches which I'll send in a reply.

Questions for perf experts:

  - Is the perf_event lifetime managed correctly or do we need to do
    something to ensure it exists in unwind_user_task_work()?

    Or alternatively is the original perf_event even needed in
    unwind_user_task_work() or can a new one be created on demand?

  - Is --call-graph=sframe needed for consistency?

  - Should perf use the context cookie?  Note that because the callback
    is usually only called once for multiple NMIs in the same entry
    context, it's possible for the PERF_RECORD_CALLCHAIN_DEFERRED event
    to arrive *before* some of the corresponding kernel events.  The
    context cookie disambiguates the corner cases.

Based on tip/master.

Also at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git sframe-v3


v3:
- move the "deferred" logic out of perf and into unwind_user with new
  unwind_user_deferred() interface [Steven, Mathieu]
- add more sframe sanity checks [Steven]
- make frame pointers optional depending on arch [Jens]
- fix perf event output [Namhyung]
- include Namhyung's perf tool patches
- enable sframe generation in VDSO
- fix build errors [robot]

v2: https://lore.kernel.org/cover.1726268190.git.jpoimboe@kernel.org
- rebase on v6.11-rc7
- reorganize the patches to add sframe first
- change to sframe v2
- add new perf event type: PERF_RECORD_CALLCHAIN_DEFERRED
- add new perf attribute: defer_callchain

v1: https://lore.kernel.org/cover.1699487758.git.jpoimboe@kernel.org

Some distros have started compiling frame pointers into all their
packages to enable the kernel to do system-wide profiling of user space.
Unfortunately that creates a runtime performance penalty across the
entire system.  Using DWARF (or .eh_frame) instead isn't feasible
because of complexity and slowness.

For in-kernel unwinding we solved this problem with the creation of the
ORC unwinder for x86_64.  Similarly, for user space the GNU assembler
has created the SFrame ("Simple Frame") v2 format starting with binutils
2.41.

These patches add support for unwinding user space from the kernel using
SFrame with perf.  It should be easy to add user unwinding support for
other components like ftrace.

There were two main challenges:

1) Finding .sframe sections in shared/dlopened libraries

   The kernel has no visibility to the contents of shared libraries.
   This was solved by adding a PR_ADD_SFRAME option to prctl() which
   allows the runtime linker to manually provide the in-memory address
   of an .sframe section to the kernel.

2) Dealing with page faults

   Keeping all binaries' sframe data pinned would likely waste a lot of
   memory.  Instead, read it from user space on demand.  That can't be
   done from perf NMI context due to page faults, so defer the unwind to
   the next user exit.  Since the NMI handler doesn't do exit work,
   self-IPI and then schedule task work to be run on exit from the IPI.

Special thanks to Indu for the original concept, and to Steven and Peter
for helping a lot with the design.  And to Steven for letting me do it ;-)



Josh Poimboeuf (15):
  x86/vdso: Fix DWARF generation for getrandom()
  x86/asm: Avoid emitting DWARF CFI for non-VDSO
  x86/asm: Fix VDSO DWARF generation with kernel IBT enabled
  x86/vdso: Use SYM_FUNC_{START,END} in __kernel_vsyscall()
  x86/vdso: Use CFI macros in __vdso_sgx_enter_enclave()
  x86/vdso: Enable sframe generation in VDSO
  unwind: Add user space unwinding API
  unwind/x86: Enable CONFIG_HAVE_UNWIND_USER_FP
  unwind: Introduce sframe user space unwinding
  unwind/x86: Enable CONFIG_HAVE_UNWIND_USER_SFRAME
  unwind: Add deferred user space unwinding API
  perf: Remove get_perf_callchain() 'init_nr' argument
  perf: Remove get_perf_callchain() 'crosstask' argument
  perf: Simplify get_perf_callchain() user logic
  perf: Add deferred user callchains

Namhyung Kim (4):
  perf tools: Minimal CALLCHAIN_DEFERRED support
  perf record: Enable defer_callchain for user callchains
  perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED
  perf tools: Merge deferred user callchains

 arch/Kconfig                              |  14 +
 arch/x86/Kconfig                          |   2 +
 arch/x86/entry/vdso/Makefile              |   6 +-
 arch/x86/entry/vdso/vdso-layout.lds.S     |   5 +-
 arch/x86/entry/vdso/vdso32/system_call.S  |  10 +-
 arch/x86/entry/vdso/vgetrandom-chacha.S   |   3 +-
 arch/x86/entry/vdso/vsgx.S                |  19 +-
 arch/x86/include/asm/dwarf2.h             |  40 ++-
 arch/x86/include/asm/linkage.h            |  29 +-
 arch/x86/include/asm/mmu.h                |   2 +-
 arch/x86/include/asm/unwind_user.h        |  11 +
 arch/x86/include/asm/vdso.h               |   1 -
 fs/binfmt_elf.c                           |  35 +-
 include/linux/entry-common.h              |   3 +
 include/linux/mm_types.h                  |   3 +
 include/linux/perf_event.h                |  12 +-
 include/linux/sched.h                     |   5 +
 include/linux/sframe.h                    |  41 +++
 include/linux/unwind_user.h               |  99 ++++++
 include/uapi/linux/elf.h                  |   1 +
 include/uapi/linux/perf_event.h           |  22 +-
 include/uapi/linux/prctl.h                |   3 +
 kernel/Makefile                           |   1 +
 kernel/bpf/stackmap.c                     |  14 +-
 kernel/events/callchain.c                 |  47 +--
 kernel/events/core.c                      |  70 +++-
 kernel/fork.c                             |  14 +
 kernel/sys.c                              |  11 +
 kernel/unwind/Makefile                    |   2 +
 kernel/unwind/sframe.c                    | 380 ++++++++++++++++++++++
 kernel/unwind/sframe.h                    | 215 ++++++++++++
 kernel/unwind/user.c                      | 318 ++++++++++++++++++
 mm/init-mm.c                              |   6 +
 tools/include/uapi/linux/perf_event.h     |  22 +-
 tools/lib/perf/include/perf/event.h       |   7 +
 tools/perf/Documentation/perf-script.txt  |   5 +
 tools/perf/builtin-script.c               |  92 ++++++
 tools/perf/util/callchain.c               |  24 ++
 tools/perf/util/callchain.h               |   3 +
 tools/perf/util/event.c                   |   1 +
 tools/perf/util/evlist.c                  |   1 +
 tools/perf/util/evlist.h                  |   1 +
 tools/perf/util/evsel.c                   |  32 +-
 tools/perf/util/evsel.h                   |   1 +
 tools/perf/util/machine.c                 |   1 +
 tools/perf/util/perf_event_attr_fprintf.c |   1 +
 tools/perf/util/sample.h                  |   3 +-
 tools/perf/util/session.c                 |  78 +++++
 tools/perf/util/tool.c                    |   2 +
 tools/perf/util/tool.h                    |   4 +-
 50 files changed, 1634 insertions(+), 88 deletions(-)
 create mode 100644 arch/x86/include/asm/unwind_user.h
 create mode 100644 include/linux/sframe.h
 create mode 100644 include/linux/unwind_user.h
 create mode 100644 kernel/unwind/Makefile
 create mode 100644 kernel/unwind/sframe.c
 create mode 100644 kernel/unwind/sframe.h
 create mode 100644 kernel/unwind/user.c

-- 
2.47.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ