[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a85b4b4-a240-4e8b-b2f4-5eede3297082@efficios.com>
Date: Wed, 23 Jul 2025 11:15:34 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, bpf@...r.kernel.org, x86@...nel.org,
Josh Poimboeuf <jpoimboe@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Andrii Nakryiko <andrii@...nel.org>, Indu Bhagat <indu.bhagat@...cle.com>,
"Jose E. Marchesi" <jemarch@....org>,
Beau Belgrave <beaub@...ux.microsoft.com>, Jens Remus
<jremus@...ux.ibm.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>, Jens Axboe <axboe@...nel.dk>,
Florian Weimer <fweimer@...hat.com>, Sam James <sam@...too.org>,
Brian Robbins <brianrob@...rosoft.com>,
Elena Zannoni <elena.zannoni@...cle.com>
Subject: Re: [RFC] New codectl(2) system call for sframe registration
On 2025-07-22 20:26, Masami Hiramatsu (Google) wrote:
> Hi Mathieu,
>
> On Mon, 21 Jul 2025 11:20:34 -0400
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
>
>> Hi!
>>
>> I've written up an RFC for a new system call to handle sframe registration
>> for shared libraries. There has been interest to cover both sframe in
>> the short term, but also JIT use-cases in the long term, so I'm
>> covering both here in this RFC to provide the full context. Implementation
>> wise we could start by only covering the sframe use-case.
>>
>> I've called it "codectl(2)" for now, but I'm of course open to feedback.
>
> Nice idea for JIT, but I doubt we need this for ELF.
>
>>
>> For ELF, I'm including the optional pathname, build id, and debug link
>> information which are really useful to translate from instruction pointers
>> to executable/library name, symbol, offset, source file, line number.
>
> For ELF file, does the kernel already know how to parse the elf header?
> I just wonder what happen if user sends different information to the
> kernel.
AFAIU, the kernel has an elf parser that is uses on execve when it
executes a program, but the dynamic linking use-case all happens in
userspace. The kernel only maps memory and currently does not know that
it contains an ELF file.
The objective here is to allow registration of shared libraries sframe
sections from the dynamic linker.
>
>> This is what we are using in LTTng-UST and Babeltrace debug-info filter
>> plugin [1], and I think this would be relevant for kernel tracers as well
>> so they can make the resulting stack traces meaningful to users.
>>
>> sys_codectl(2)
>> =================
>>
>> * arg0: unsigned int @option:
>>
>> /* Additional labels can be added to enum code_opt, for extensibility. */
>>
>> enum code_opt {
>> CODE_REGISTER_ELF,
>> CODE_REGISTER_JIT,
>> CODE_UNREGISTER,
>> };
>>
>> * arg1: void * @info
>>
>> /* if (@option == CODE_REGISTER_ELF) */
>>
>> /*
>> * text_start, text_end, sframe_start, sframe_end allow unwinding of the
>> * call stack.
>> *
>> * elf_start, elf_end, pathname, and either build_id or debug_link allows
>> * mapping instruction pointers to file, symbol, offset, and source file
>> * location.
>> */
>> struct code_elf_info {
>> : __u64 elf_start;
>> __u64 elf_end;
>> __u64 text_start;
>> __u64 text_end;
>
> What happen if there are multiple .text.* sections?
> Or, does it used for each text section?
That's a good point. I guess we could theoretically have a shared
object that has more than one text range, in which case we'd want to
register one sframe section for each of the text range. (let me know
if I'm misunderstanding something here)
This is an additional argument for having an sframe-specific
registration rather than an "elf" registration for the sframe
use-case.
>
>> __u64 sframe_start;
>> __u64 sframe_end;
>> __u64 pathname; /* char *, NULL if unavailable. */
>>
>> __u64 build_id; /* char *, NULL if unavailable. */
>> __u64 debug_link_pathname; /* char *, NULL if unavailable. */
>> __u32 build_id_len;
>> __u32 debug_link_crc;
>> };
>>
>>
>> /* if (@option == CODE_REGISTER_JIT) */
>>
>> /*
>> * Registration of sorted JIT unwind table: The reserved memory area is
>> * of size reserved_len. Userspace increases used_len as new code is
>> * populated between text_start and text_end. This area is populated in
>> * increasing address order, and its ABI requires to have no overlapping
>> * fre. This fits the common use-case where JITs populate code into
>> * a given memory area by increasing address order. The sorted unwind
>> * tables can be chained with a singly-linked list as they become full.
>> * Consecutive chained tables are also in sorted text address order.
>> *
>> * Note: if there is an eventual use-case for unsorted jit unwind table,
>> * this would be introduced as a new "code option".
>> */
>>
>> struct code_jit_info {
>> __u64 text_start; /* text_start >= addr */
>> __u64 text_end; /* addr < text_end */
>> __u64 unwind_head; /* struct code_jit_unwind_table * */
>> };
>>
>> struct code_jit_unwind_fre {
>> /*
>> * Contains info similar to sframe, allowing unwind for a given
>
> Hmm, why not just the sframe?
> (Is there any library to generate sframe online for JIT?)
The layout and size of the sframe section is fixed after it's been
registered. The jit unwind tables are meant to dynamically
grow as the JIT populates additional code. The goal here is to make sure
JITs don't have to issue a system call every time they add a few
functions, otherwise the overhead becomes a significant bottleneck.
Thanks,
Mathieu
>
> Thank you,
>
>> * code address range.
>> */
>> __u32 size;
>> __u32 ip_off; /* offset from text_start */
>> __s32 cfa_off;
>> __s32 ra_off;
>> __s32 fp_off;
>> __u8 info;
>> };
>>
>> struct code_jit_unwind_table {
>> __u64 reserved_len;
>> __u64 used_len; /*
>> * Incremented by userspace (store-release), read by
>> * the kernel (load-acquire).
>> */
>> __u64 next; /* Chain with next struct code_jit_unwind_table. */
>> struct code_jit_unwind_fre fre[];
>> };
>>
>> /* if (@option == CODE_UNREGISTER) */
>>
>> void *info
>>
>> * arg2: size_t info_size
>>
>> /*
>> * Size of @info structure, allowing extensibility. See
>> * copy_struct_from_user().
>> */
>>
>> * arg3: unsigned int flags (0)
>>
>> /* Flags for extensibility. */
>>
>> Your feedback is welcome,
>>
>> Thanks,
>>
>> Mathieu
>>
>> [1] https://babeltrace.org/docs/v2.0/man7/babeltrace2-filter.lttng-utils.debug-info.7/
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>>
>
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists