[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mzqhpduikzpeczmmxh5uyfzjpvdeae3ityqyp2rno4myujzb6y@ey346eksvoyf>
Date: Thu, 23 Oct 2025 01:09:02 -0700
From: Fangrui Song <maskray@...rceware.org>
To: Jens Remus <jremus@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
bpf@...r.kernel.org, x86@...nel.org, linux-mm@...ck.org,
Steven Rostedt <rostedt@...nel.org>, Josh Poimboeuf <jpoimboe@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Andrii Nakryiko <andrii@...nel.org>,
Indu Bhagat <indu.bhagat@...cle.com>, "Jose E. Marchesi" <jemarch@....org>,
Beau Belgrave <beaub@...ux.microsoft.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>, Florian Weimer <fweimer@...hat.com>, Kees Cook <kees@...nel.org>,
Carlos O'Donell <codonell@...hat.com>, Sam James <sam@...too.org>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, David Hildenbrand <david@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Michal Hocko <mhocko@...e.com>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>
Subject: Re: [PATCH v11 00/15] unwind_deferred: Implement sframe handling
On 2025-10-22, Jens Remus wrote:
>This is the implementation of parsing the SFrame section in an ELF file.
>It's a continuation of Josh's and Steve's last work that can be found
>here:
>
> https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
> https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
>
>Currently the only way to get a user space stack trace from a stack
>walk (and not just copying large amount of user stack into the kernel
>ring buffer) is to use frame pointers. This has a few issues. The biggest
>one is that compiling frame pointers into every application and library
>has been shown to cause performance overhead.
>
>Another issue is that the format of the frames may not always be consistent
>between different compilers and some architectures (s390) has no defined
>format to do a reliable stack walk. The only way to perform user space
>profiling on these architectures is to copy the user stack into the kernel
>buffer.
>
>SFrames[1] is now supported in gcc binutils and soon will also be supported
>by LLVM.
Please consider dropping the statement, "soon will also be supported by LLVM."
Speaking as LLVM's MC, lld/ELF, and binary utilities maintainer, I have significant concerns about the v2 format, specifically its apparent disregard for standard ELF and linker conventions
(https://maskray.me/blog/2025-09-28-remarks-on-sframe#linking-and-execution-views)
To arm64 maintainers, it is critical time to revisit a unwind
information format, as I have outlined in my blog post:
A sorted address table like .eh_frame_hdr might still be needed, but the
design could be very different for arm64.
I am curious whether anyone has thought about a library that parses .eh_frame and generates SFrame.
If objtool integrates this library, it can generate SFrame for vmlinux and modules without relying on assembler/linker.
Linker and assembler requires a level of stability that is currently concerning on the toolchain side.
(https://sourceware.org/pipermail/binutils/2025-October/144974.html
"This "linker will DTRT" assertion glosses over significant
implementation complexity. Each version needs not just a reader but
version-specific *merging* logic in every linker—fundamentally different
from simply reading a format.")
>SFrames acts more like ORC, and lives in the ELF executable
>file as its own section. Like ORC it has two tables where the first table
>is sorted by instruction pointers (IP) and using the current IP and finding
>it's entry in the first table, it will take you to the second table which
>will tell you where the return address of the current function is located
>and then you can use that address to look it up in the first table to find
>the return address of that function, and so on. This performs a user
>space stack walk.
>
>Now because the SFrame section lives in the ELF file it needs to be faulted
>into memory when it is used. This means that walking the user space stack
>requires being in a faultable context. As profilers like perf request a stack
>trace in interrupt or NMI context, it cannot do the walking when it is
>requested. Instead it must be deferred until it is safe to fault in user
>space. One place this is known to be safe is when the task is about to return
>back to user space.
>
>This series makes the deferred unwind code implement SFrames.
>
>[1] https://sourceware.org/binutils/wiki/sframe
>
>Changes since v10:
>- Rebase on v6.17-rc1 with Peter's unwind user fixes and x86 support
> series [2] and Steve's support for the deferred unwinding infrastructure
> series in perf [3] and perf tool [4] on top.
>- Support for SFrame V2 PC-relative FDE function start address. (Jens)
>- Support for SFrame V2 representing RA undefined as indication for
> outermost frames. (Jens)
>
>[2]: [PATCH 00/12] Various fixes and x86 support,
> https://lore.kernel.org/all/20250924075948.579302904@infradead.org/
>[3]: [PATCH v16 0/4] perf: Support the deferred unwinding infrastructure,
> https://lore.kernel.org/all/20251007214008.080852573@kernel.org/
>[4]: [PATCH v16 0/4] perf tool: Support the deferred unwinding infrastructure,
> https://lore.kernel.org/all/20250908175319.841517121@kernel.org/
>
>Patches 1 and 2 are suggested fixups to patches from Peter's unwind user
>fixes and x86 support series. They keep the factoring out of the word
>size from the frame's CFA, FP, and RA offsets local to unwind user fp, as
>unwind user sframe does use absolute offsets.
>
>Patches 3, 6, and 14 have been updated to exclusively support the recent
>PC-relative SFrame FDE function start address encoding. With Binutils 2.45
>the SFrame V2 FDE function start address field value is an offset from the
>field (i.e. PC-relative) instead of from the .sframe section start. This
>is indicated by the new SFrame header flag SFRAME_F_FDE_FUNC_START_PCREL.
>Old SFrame V2 sections get rejected with dynamic debug message
>"bad/unsupported sframe header".
>
>Patches 9 and 10 add support to unwind user and unwind user sframe for
>a recent change of the SFrame V2 format to represent an undefined
>return address as an SFrame FRE without any offsets, which is used as
>indication for outermost frames. Note that currently only a development
>build of Binutils mainline generates SFrame information including this
>new indication for outermost frames. SFrame information without the new
>indication is still supported. Without these patches unwind user sframe
>would identify such new SFrame FREs without any offsets as corrupted and
>remove the .sframe section, causing any any further stack tracing using
>sframe to fail.
>
>Regards,
>Jens
>
>
>Jens Remus (4):
> fixup! unwind: Implement compat fp unwind
> fixup! unwind_user/x86: Enable frame pointer unwinding on x86
> unwind_user: Stop when reaching an outermost frame
> unwind_user/sframe: Add support for outermost frame indication
>
>Josh Poimboeuf (11):
> unwind_user/sframe: Add support for reading .sframe headers
> unwind_user/sframe: Store sframe section data in per-mm maple tree
> x86/uaccess: Add unsafe_copy_from_user() implementation
> unwind_user/sframe: Add support for reading .sframe contents
> unwind_user/sframe: Detect .sframe sections in executables
> unwind_user/sframe: Wire up unwind_user to sframe
> unwind_user/sframe/x86: Enable sframe unwinding on x86
> unwind_user/sframe: Remove .sframe section on detected corruption
> unwind_user/sframe: Show file name in debug output
> unwind_user/sframe: Add .sframe validation option
> unwind_user/sframe: Add prctl() interface for registering .sframe
> sections
>
> MAINTAINERS | 1 +
> arch/Kconfig | 23 ++
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/mmu.h | 2 +-
> arch/x86/include/asm/uaccess.h | 39 +-
> arch/x86/include/asm/unwind_user.h | 11 +-
> fs/binfmt_elf.c | 49 ++-
> include/linux/mm_types.h | 3 +
> include/linux/sframe.h | 60 +++
> include/linux/unwind_user_types.h | 5 +-
> include/uapi/linux/elf.h | 1 +
> include/uapi/linux/prctl.h | 6 +-
> kernel/fork.c | 10 +
> kernel/sys.c | 9 +
> kernel/unwind/Makefile | 3 +-
> kernel/unwind/sframe.c | 615 +++++++++++++++++++++++++++++
> kernel/unwind/sframe.h | 72 ++++
> kernel/unwind/sframe_debug.h | 68 ++++
> kernel/unwind/user.c | 56 ++-
> mm/init-mm.c | 2 +
> 20 files changed, 1004 insertions(+), 32 deletions(-)
> create mode 100644 include/linux/sframe.h
> create mode 100644 kernel/unwind/sframe.c
> create mode 100644 kernel/unwind/sframe.h
> create mode 100644 kernel/unwind/sframe_debug.h
>
>--
>2.48.1
>
Powered by blists - more mailing lists