lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7926bca-318b-40a0-a586-83516302e8c1@efficios.com>
Date: Mon, 21 Jul 2025 16:58:43 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 bpf@...r.kernel.org, x86@...nel.org, Masami Hiramatsu <mhiramat@...nel.org>,
 Josh Poimboeuf <jpoimboe@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
 Ingo Molnar <mingo@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
 Andrii Nakryiko <andrii@...nel.org>, Indu Bhagat <indu.bhagat@...cle.com>,
 "Jose E. Marchesi" <jemarch@....org>,
 Beau Belgrave <beaub@...ux.microsoft.com>, Jens Remus
 <jremus@...ux.ibm.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
 Andrew Morton <akpm@...ux-foundation.org>, Jens Axboe <axboe@...nel.dk>,
 Florian Weimer <fweimer@...hat.com>, Sam James <sam@...too.org>,
 Brian Robbins <brianrob@...rosoft.com>,
 Elena Zannoni <elena.zannoni@...cle.com>
Subject: Re: [RFC] New codectl(2) system call for sframe registration

On 2025-07-21 14:53, Steven Rostedt wrote:
> On Mon, 21 Jul 2025 11:20:34 -0400
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
>> Hi!
>>
>> I've written up an RFC for a new system call to handle sframe registration
>> for shared libraries. There has been interest to cover both sframe in
>> the short term, but also JIT use-cases in the long term, so I'm
>> covering both here in this RFC to provide the full context. Implementation
>> wise we could start by only covering the sframe use-case.
>>
>> I've called it "codectl(2)" for now, but I'm of course open to feedback.
> 
> Hmm, I guess I'm OK with that name. I can't really think of anything that
> would be better. But kernel developers are notorious for sucking at coming
> up with decent names ;-)

I agree wholeheartedly. ;)

> 
>>
>> For ELF, I'm including the optional pathname, build id, and debug link
>> information which are really useful to translate from instruction pointers
>> to executable/library name, symbol, offset, source file, line number.
>> This is what we are using in LTTng-UST and Babeltrace debug-info filter
>> plugin [1], and I think this would be relevant for kernel tracers as well
>> so they can make the resulting stack traces meaningful to users.
> 
> Honestly, I'm not sure it needs to be an ELF file. Just a file that has an
> sframe section in it.

Indu told me on IRC that for GNU/Linux, SFrame will be an
allocated,loaded section in elf files.

I'm planning to add optional fields (build id, debug link) that are
ELF-specific. I therefore think it's best that we keep this specific as
registration of an elf file.

If there are other file types in the future that happen to contain an
sframe section (but are not ELF), then we can simply add a new label to
enum code_opt.

> 
>>
>> sys_codectl(2)
>> =================
>>
>> * arg0: unsigned int @option:
>>
>> /* Additional labels can be added to enum code_opt, for extensibility. */
>>
>> enum code_opt {
>>       CODE_REGISTER_ELF,
> 
> Perhaps the above should be: CODE_REGISTER_SFRAME,
> 
> as currently SFrame is read only via files.

As I pointed out above, on GNU/Linux, sframe is always an allocated,loaded
ELF section. AFAIU, your comment implies that we'd want to support other scenarios
where the sframe is in files outside of elf binary sframe sections. Can you
expand on the use-case you have for this, or is it just for future-proofing ?

> 
>>       CODE_REGISTER_JIT,
> 
>  From our other conversations, JIT will likely be a completely different
> format than SFRAME, so calling it just JIT should be fine.

OK

> 
> 
>>       CODE_UNREGISTER,
> 
> I wonder if this should be the first enum. That is, "0" is to unregister.
> 
> That way, all non-zero options will be for what is being registered, and
> "0" is for unregistering any of them.

Good idea, I'll do that.

> 
> 
>> };
>>
>> * arg1: void * @info
>>
>> /* if (@option == CODE_REGISTER_ELF) */
>>
>> /*
>>    * text_start, text_end, sframe_start, sframe_end allow unwinding of the
>>    * call stack.
>>    *
>>    * elf_start, elf_end, pathname, and either build_id or debug_link allows
>>    * mapping instruction pointers to file, symbol, offset, and source file
>>    * location.
>>    */
>> struct code_elf_info {
>> :   __u64 elf_start;
>>       __u64 elf_end;
> 
> Perhaps:
> 
> 	__u64 file_start;
> 	__u64 file_end;
> 
> ?
> 
> And call it "struct code_sframe_info"
> 
>>       __u64 text_start;
>>       __u64 text_end;
> 
>>       __u64 sframe_start;
>>       __u64 sframe_end;
> 
> What is the above "sframe" for?
> 
>>       __u64 pathname;              /* char *, NULL if unavailable. */
>>
>>       __u64 build_id;              /* char *, NULL if unavailable. */
>>       __u64 debug_link_pathname;   /* char *, NULL if unavailable. */
> 
> Maybe just list the above three as "optional" ?

This is what I had in mind with "NULL if unavailable", but I can clarify
them as being "optional" in the comment.

Do you envision that the sizeof(struct code_elf_info) could be smaller
and not include the optional fields, or just specifying them as NULL if
unavailable is enough ?

> 
> It may be available, but the implementer just doesn't want to implement it.
> 
>>       __u32 build_id_len;
>>       __u32 debug_link_crc;
>> };
>>
>>
>> /* if (@option == CODE_REGISTER_JIT) */
>>
>> /*
>>    * Registration of sorted JIT unwind table: The reserved memory area is
>>    * of size reserved_len. Userspace increases used_len as new code is
>>    * populated between text_start and text_end. This area is populated in
>>    * increasing address order, and its ABI requires to have no overlapping
>>    * fre. This fits the common use-case where JITs populate code into
>>    * a given memory area by increasing address order. The sorted unwind
>>    * tables can be chained with a singly-linked list as they become full.
>>    * Consecutive chained tables are also in sorted text address order.
>>    *
>>    * Note: if there is an eventual use-case for unsorted jit unwind table,
>>    * this would be introduced as a new "code option".
>>    */
>>
>> struct code_jit_info {
>>       __u64 text_start;      /* text_start >= addr */
>>       __u64 text_end;        /* addr < text_end */
>>       __u64 unwind_head;     /* struct code_jit_unwind_table * */
>> };
>>
>> struct code_jit_unwind_fre {
>>       /*
>>        * Contains info similar to sframe, allowing unwind for a given
>>        * code address range.
>>        */
>>       __u32 size;
>>       __u32 ip_off;  /* offset from text_start */
>>       __s32 cfa_off;
>>       __s32 ra_off;
>>       __s32 fp_off;
>>       __u8 info;
>> };
>>
>> struct code_jit_unwind_table {
>>       __u64 reserved_len;
>>       __u64 used_len; /*
>>                        * Incremented by userspace (store-release), read by
>>                        * the kernel (load-acquire).
>>                        */
>>       __u64 next;     /* Chain with next struct code_jit_unwind_table. */
>>       struct code_jit_unwind_fre fre[];
>> };
> 
> I wonder if we should avoid the "jit" portion completely for now until we
> know what exactly we need.

I don't want to spend too much discussion time on the jit portion at this stage,
but I think it's good to keep this in mind so we come up with an ABI that will
naturally extend to cover that use case. I favor keeping the JIT portion in these
discussions but not implement it initially.

Thanks Steven!

Mathieu

> 
> Thanks,
> 
> -- Steve
> 
> 
>>
>> /* if (@option == CODE_UNREGISTER) */
>>
>> void *info
>>
>> * arg2: size_t info_size
>>
>> /*
>>    * Size of @info structure, allowing extensibility. See
>>    * copy_struct_from_user().
>>    */
>>
>> * arg3: unsigned int flags (0)
>>
>> /* Flags for extensibility. */
>>
>> Your feedback is welcome,
>>
>> Thanks,
>>
>> Mathieu
>>
>> [1] https://babeltrace.org/docs/v2.0/man7/babeltrace2-filter.lttng-utils.debug-info.7/
>>
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ