lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c00790c-66c4-4bce-bd5f-7c67a3a121be@efficios.com>
Date: Tue, 22 Jul 2025 09:51:22 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 bpf@...r.kernel.org, x86@...nel.org, Masami Hiramatsu <mhiramat@...nel.org>,
 Josh Poimboeuf <jpoimboe@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
 Ingo Molnar <mingo@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
 Andrii Nakryiko <andrii@...nel.org>, Indu Bhagat <indu.bhagat@...cle.com>,
 "Jose E. Marchesi" <jemarch@....org>,
 Beau Belgrave <beaub@...ux.microsoft.com>, Jens Remus
 <jremus@...ux.ibm.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
 Andrew Morton <akpm@...ux-foundation.org>, Jens Axboe <axboe@...nel.dk>,
 Florian Weimer <fweimer@...hat.com>, Sam James <sam@...too.org>,
 Brian Robbins <brianrob@...rosoft.com>,
 Elena Zannoni <elena.zannoni@...cle.com>
Subject: Re: [RFC] New codectl(2) system call for sframe registration

On 2025-07-21 17:15, Steven Rostedt wrote:
> On Mon, 21 Jul 2025 16:58:43 -0400
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
>>> Honestly, I'm not sure it needs to be an ELF file. Just a file that has an
>>> sframe section in it.
>>
>> Indu told me on IRC that for GNU/Linux, SFrame will be an
>> allocated,loaded section in elf files.
> 
> Yes it is, but is that a requirement for this interface? I just don't want
> to add requirements based on how thing currently work if they are not
> needed.

The ELF-specific optional fields I am suggesting (pathname, build id,
debug info) are useful for tooling. AFAIR this is what gdb uses to
find the debug info associated with each binary file.

> 
>>
>> I'm planning to add optional fields (build id, debug link) that are
>> ELF-specific. I therefore think it's best that we keep this specific as
>> registration of an elf file.
> 
> Here's a hypothetical, what if for some reason (say having the sframe
> sections outside of the elf file) that the linker shares that?

So your hypothetical scenario is having sframe provided as a separate
file. This sframe file (or part of it) would still describe how to
unwind a given elf file text range. So I would argue that this would
still fit into the model of CODE_REGISTER_ELF, it's just that the
address range from sframe_start to sframe_end would be mapped from
a different file. This is entirely up to the dynamic loader and should
not impact the kernel ABI.

AFAIK the a.out binary support was deprecated in Linux kernel v5.1. So
being elf specific is not an issue.

And if for some reason we end up inventing a new model to hand over the
sframe information in the future, for instance if we choose not to map
the sframe information in userspace and hand over a sframe-file pathname
and offset instead, we'll just extend the code_opt enum with a new
label.

> 
> For instance, if the sframe sections are downloaded separately as a
> separate package for a given executable (to make it not mandatory for an
> install), the linker could be smart enough to see that they exist in some
> special location and then pass that to the kernel. In other words, this is
> option is specific for sframe and not ELF. I rather call it by that.

As I explained above, if the dynamic loader populates the sframe section
in userspace memory, this fits within the CODE_REGISTER_ELF ABI. If we
eventually choose not to map the sframe section into userspace memory
(even though this is not an envisioned use-case at the moment), we can
just extend enum code_opt with a new label.

> 
>>
>> If there are other file types in the future that happen to contain an
>> sframe section (but are not ELF), then we can simply add a new label to
>> enum code_opt.
>>
>>>    
>>>>
>>>> sys_codectl(2)
>>>> =================
>>>>
>>>> * arg0: unsigned int @option:
>>>>
>>>> /* Additional labels can be added to enum code_opt, for extensibility. */
>>>>
>>>> enum code_opt {
>>>>        CODE_REGISTER_ELF,
>>>
>>> Perhaps the above should be: CODE_REGISTER_SFRAME,
>>>
>>> as currently SFrame is read only via files.
>>
>> As I pointed out above, on GNU/Linux, sframe is always an allocated,loaded
>> ELF section. AFAIU, your comment implies that we'd want to support other scenarios
>> where the sframe is in files outside of elf binary sframe sections. Can you
>> expand on the use-case you have for this, or is it just for future-proofing ?
> 
> Heh, I just did above (before reading this). But yeah, it could be. As I
> mentioned above, this is not about ELF files. Sframes just happen to be in
> an ELF file. CODE_REGISTER_ELF sounds like this is for doing special
> actions to an ELF file, when in reality it is doing special actions to tell
> the kernel this is an sframe table. It just happens that sframes are in
> ELF. Let's call it for what it is used for.

I see sframe as one "aspect" of an ELF file. Sure, we could do one
system call for every aspect of an ELF file that we want to register,
but that would require many round trips from userspace to the kernel
every time a library is loaded. In my opinion it makes sense to combine
all aspects of an elf file that we want the kernel to know about into
one registration system call. In that sense, we're not registering just
sframe, but the various aspects of an ELF file, which include sframe.

By the way, the sframe section is optional as well. If we allow
sframe_start and sframe_end to be NULL, this would let libc register
an sframe-less ELF file with its pathname, build-id, and debug info
to the kernel. This would be immediately useful on its own for
distributions that have frame pointers enabled even without sframe
section.

[...]

>>>    
>>>> };
>>>>
>>>> * arg1: void * @info
>>>>
>>>> /* if (@option == CODE_REGISTER_ELF) */
>>>>
>>>> /*
>>>>     * text_start, text_end, sframe_start, sframe_end allow unwinding of the
>>>>     * call stack.
>>>>     *
>>>>     * elf_start, elf_end, pathname, and either build_id or debug_link allows
>>>>     * mapping instruction pointers to file, symbol, offset, and source file
>>>>     * location.
>>>>     */
>>>> struct code_elf_info {
>>>> :   __u64 elf_start;
>>>>        __u64 elf_end;
>>>
>>> Perhaps:
>>>
>>> 	__u64 file_start;
>>> 	__u64 file_end;
>>>
>>> ?
>>>
>>> And call it "struct code_sframe_info"
>>>    
>>>>        __u64 text_start;
>>>>        __u64 text_end;
>>>    
>>>>        __u64 sframe_start;
>>>>        __u64 sframe_end;
>>>
>>> What is the above "sframe" for?
> 
> Still wondering what the above is for.

Well we have an sframe section which is mapped into userspace memory
from sframe_start to sframe_end, which contains the unwind information
that covers the code from text_start to text_end.

Am I unknowingly adding some kind of redundancy here ?

> 
>>>    
>>>>        __u64 pathname;              /* char *, NULL if unavailable. */
>>>>
>>>>        __u64 build_id;              /* char *, NULL if unavailable. */
>>>>        __u64 debug_link_pathname;   /* char *, NULL if unavailable. */
>>>
>>> Maybe just list the above three as "optional" ?
>>
>> This is what I had in mind with "NULL if unavailable", but I can clarify
>> them as being "optional" in the comment.
>>
>> Do you envision that the sizeof(struct code_elf_info) could be smaller
>> and not include the optional fields, or just specifying them as NULL if
>> unavailable is enough ?
> 
> Hmm, are we going to allow this structure to expand? Should we give it a
> size. Or just state that different options could have different sizes (and
> make this more of a union than a structure).

This is extensible. The size of this structure is expected in
arg2: size_t info_size.

Each "@option" label select which structure is expected, and each of
those structures are extensible, with their size expected as arg2.

> 
>>
>>>
>>> It may be available, but the implementer just doesn't want to implement it.
>>>    
>>>>        __u32 build_id_len;
>>>>        __u32 debug_link_crc;
>>>> };
>>>>
>>>>
>>>> /* if (@option == CODE_REGISTER_JIT) */
>>>>
>>>> /*
>>>>     * Registration of sorted JIT unwind table: The reserved memory area is
>>>>     * of size reserved_len. Userspace increases used_len as new code is
>>>>     * populated between text_start and text_end. This area is populated in
>>>>     * increasing address order, and its ABI requires to have no overlapping
>>>>     * fre. This fits the common use-case where JITs populate code into
>>>>     * a given memory area by increasing address order. The sorted unwind
>>>>     * tables can be chained with a singly-linked list as they become full.
>>>>     * Consecutive chained tables are also in sorted text address order.
>>>>     *
>>>>     * Note: if there is an eventual use-case for unsorted jit unwind table,
>>>>     * this would be introduced as a new "code option".
>>>>     */
>>>>
>>>> struct code_jit_info {
>>>>        __u64 text_start;      /* text_start >= addr */
>>>>        __u64 text_end;        /* addr < text_end */
>>>>        __u64 unwind_head;     /* struct code_jit_unwind_table * */
>>>> };
>>>>
>>>> struct code_jit_unwind_fre {
>>>>        /*
>>>>         * Contains info similar to sframe, allowing unwind for a given
>>>>         * code address range.
>>>>         */
>>>>        __u32 size;
>>>>        __u32 ip_off;  /* offset from text_start */
>>>>        __s32 cfa_off;
>>>>        __s32 ra_off;
>>>>        __s32 fp_off;
>>>>        __u8 info;
>>>> };
>>>>
>>>> struct code_jit_unwind_table {
>>>>        __u64 reserved_len;
>>>>        __u64 used_len; /*
>>>>                         * Incremented by userspace (store-release), read by
>>>>                         * the kernel (load-acquire).
>>>>                         */
>>>>        __u64 next;     /* Chain with next struct code_jit_unwind_table. */
>>>>        struct code_jit_unwind_fre fre[];
>>>> };
>>>
>>> I wonder if we should avoid the "jit" portion completely for now until we
>>> know what exactly we need.
>>
>> I don't want to spend too much discussion time on the jit portion at this stage,
>> but I think it's good to keep this in mind so we come up with an ABI that will
>> naturally extend to cover that use case. I favor keeping the JIT portion in these
>> discussions but not implement it initially.
> 
> As long as the structure is flexible to handle this. We could even add the
> JIT enum, but return -EINVAL (or whatever) if it is used to state that it's
> not currently implemented.

Sure, we can indeed initially have the JIT label, and return -EINVAL for
now.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ