lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260127150554.2760964-1-jremus@linux.ibm.com>
Date: Tue, 27 Jan 2026 16:05:35 +0100
From: Jens Remus <jremus@...ux.ibm.com>
To: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
        bpf@...r.kernel.org, x86@...nel.org, linux-mm@...ck.org,
        Steven Rostedt <rostedt@...nel.org>
Cc: Jens Remus <jremus@...ux.ibm.com>, Josh Poimboeuf <jpoimboe@...nel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>,
        Jiri Olsa <jolsa@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andrii Nakryiko <andrii@...nel.org>,
        Indu Bhagat <indu.bhagat@...cle.com>,
        "Jose E. Marchesi" <jemarch@....org>,
        Beau Belgrave <beaub@...ux.microsoft.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Florian Weimer <fweimer@...hat.com>, Kees Cook <kees@...nel.org>,
        "Carlos O'Donell" <codonell@...hat.com>, Sam James <sam@...too.org>,
        Dylan Hatch <dylanbhatch@...gle.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        Michal Hocko <mhocko@...e.com>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>
Subject: [PATCH v13 00/18] unwind_deferred: Implement sframe handling

This is the implementation of parsing the SFrame V3 stack trace information
from an .sframe section in an ELF file.  It's a continuation of Josh's and
Steve's work that can be found here:

   https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
   https://lore.kernel.org/all/20250827201548.448472904@kernel.org/

Currently the only way to get a user space stack trace from a stack
walk (and not just copying large amount of user stack into the kernel
ring buffer) is to use frame pointers. This has a few issues. The biggest
one is that compiling frame pointers into every application and library
has been shown to cause performance overhead.

Another issue is that the format of the frames may not always be consistent
between different compilers and some architectures (s390) has no defined
format to do a reliable stack walk. The only way to perform user space
profiling on these architectures is to copy the user stack into the kernel
buffer.

SFrame [1] is now supported in binutils (x86-64, ARM64, and s390). There is
discussions going on about supporting SFrame in LLVM. SFrame acts more like
ORC, and lives in the ELF executable file as its own section. Like ORC it
has two tables where the first table is sorted by instruction pointers (IP)
and using the current IP and finding it's entry in the first table, it will
take you to the second table which will tell you where the return address
of the current function is located and then you can use that address to
look it up in the first table to find the return address of that function,
and so on. This performs a user space stack walk.

Now because the .sframe section lives in the ELF file it needs to be faulted
into memory when it is used. This means that walking the user space stack
requires being in a faultable context. As profilers like perf request a stack
trace in interrupt or NMI context, it cannot do the walking when it is
requested. Instead it must be deferred until it is safe to fault in user
space. One place this is known to be safe is when the task is about to return
back to user space.

This series makes the deferred unwind user code implement SFrame format V3
and enables it on x86-64.

[1]: https://sourceware.org/binutils/wiki/sframe


This series applies on top of the tip perf/core branch:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git  perf/core

The to be stack-traced user space programs (and libraries) need to be
built with the recent SFrame stack trace information format V3, as
generated by the upcoming binutils 2.46 with assembler option --gsframe.
It can be built from source from the binutils-2_46-branch branch:

  git://sourceware.org/git/binutils-gdb.git  binutils-2_46-branch

Namhyung Kim's related perf tools deferred callchain support can be used
for testing ("perf record --call-graph fp,defer" and "perf report/script").


Changes since v12 (see patch notes for details):
- Rebase on tip perf/core branch (d55c571e4333).
- Add support for SFrame V3, including its new flexible FDEs.  SFrame V2
  is not supported.

Changes since v11 (see patch notes for details):
- Rebase on tip master branch (f8fdee44bf2f) with Namhyung Kim's
  perf/defer-callchain-v4 branch merged on top.
- Adjust to Peter's latest undwind user enhancements.
- Simplify logic by using an internal SFrame FDE representation, whose
  FDE function start address field is an address instead of a PC-relative
  offset (from FDE).
- Rename struct sframe_fre to sframe_fre_internal to align with
  struct sframe_fde_internal.
- Remove unused pt_regs from unwind_user_next_common() and its
  callers. (Peter)
- Simplify unwind_user_next_sframe(). (Peter)
- Fix a few checkpatch errors and warnings.
- Minor cleanups (e.g. move includes, fix indentation).

Changes since v10:
- Support for SFrame V2 PC-relative FDE function start address.
- Support for SFrame V2 representing RA undefined as indication for
  outermost frames.


Patches 1, 4, 11, and 17 have been updated to exclusively support the
latest SFrame V3 stack trace information format, that is generated by
the upcoming binutils 2.46 release.  Old SFrame V2 sections get rejected
with dynamic debug message "bad/unsupported sframe header".

Patches 7 and 8 add support to unwind user (sframe) for outermost frames.

Patches 12-15 add support to unwind user (sframe) for the new SFrame V3
flexible FDEs.

Patch 16 improves the performance of searching the SFrame FRE for an IP.

Regards,
Jens


Jens Remus (7):
  unwind_user: Stop when reaching an outermost frame
  unwind_user/sframe: Add support for outermost frame indication
  unwind_user: Enable archs that pass RA in a register
  unwind_user: Flexible FP/RA recovery rules
  unwind_user: Flexible CFA recovery rules
  unwind_user/sframe: Add support for SFrame V3 flexible FDEs
  unwind_user/sframe: Separate reading of FRE from reading of FRE data
    words

Josh Poimboeuf (11):
  unwind_user/sframe: Add support for reading .sframe headers
  unwind_user/sframe: Store .sframe section data in per-mm maple tree
  x86/uaccess: Add unsafe_copy_from_user() implementation
  unwind_user/sframe: Add support for reading .sframe contents
  unwind_user/sframe: Detect .sframe sections in executables
  unwind_user/sframe: Wire up unwind_user to sframe
  unwind_user/sframe: Remove .sframe section on detected corruption
  unwind_user/sframe: Show file name in debug output
  unwind_user/sframe: Add .sframe validation option
  unwind_user/sframe/x86: Enable sframe unwinding on x86
  unwind_user/sframe: Add prctl() interface for registering .sframe
    sections

 MAINTAINERS                               |   1 +
 arch/Kconfig                              |  23 +
 arch/x86/Kconfig                          |   1 +
 arch/x86/include/asm/mmu.h                |   2 +-
 arch/x86/include/asm/uaccess.h            |  39 +-
 arch/x86/include/asm/unwind_user.h        |  69 +-
 arch/x86/include/asm/unwind_user_sframe.h |  12 +
 fs/binfmt_elf.c                           |  48 +-
 include/linux/mm_types.h                  |   3 +
 include/linux/sframe.h                    |  60 ++
 include/linux/unwind_user.h               |  18 +
 include/linux/unwind_user_types.h         |  46 +-
 include/uapi/linux/elf.h                  |   1 +
 include/uapi/linux/prctl.h                |   6 +-
 kernel/fork.c                             |  10 +
 kernel/sys.c                              |   9 +
 kernel/unwind/Makefile                    |   3 +-
 kernel/unwind/sframe.c                    | 840 ++++++++++++++++++++++
 kernel/unwind/sframe.h                    |  87 +++
 kernel/unwind/sframe_debug.h              |  68 ++
 kernel/unwind/user.c                      | 105 ++-
 mm/init-mm.c                              |   2 +
 22 files changed, 1414 insertions(+), 39 deletions(-)
 create mode 100644 arch/x86/include/asm/unwind_user_sframe.h
 create mode 100644 include/linux/sframe.h
 create mode 100644 kernel/unwind/sframe.c
 create mode 100644 kernel/unwind/sframe.h
 create mode 100644 kernel/unwind/sframe_debug.h

-- 
2.51.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ