[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250721171559.53ea892f@gandalf.local.home>
Date: Mon, 21 Jul 2025 17:15:59 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
bpf@...r.kernel.org, x86@...nel.org, Masami Hiramatsu
<mhiramat@...nel.org>, Josh Poimboeuf <jpoimboe@...nel.org>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, Jiri Olsa
<jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>, Thomas Gleixner
<tglx@...utronix.de>, Andrii Nakryiko <andrii@...nel.org>, Indu Bhagat
<indu.bhagat@...cle.com>, "Jose E. Marchesi" <jemarch@....org>, Beau
Belgrave <beaub@...ux.microsoft.com>, Jens Remus <jremus@...ux.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, Andrew Morton
<akpm@...ux-foundation.org>, Jens Axboe <axboe@...nel.dk>, Florian Weimer
<fweimer@...hat.com>, Sam James <sam@...too.org>, Brian Robbins
<brianrob@...rosoft.com>, Elena Zannoni <elena.zannoni@...cle.com>
Subject: Re: [RFC] New codectl(2) system call for sframe registration
On Mon, 21 Jul 2025 16:58:43 -0400
Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> > Honestly, I'm not sure it needs to be an ELF file. Just a file that has an
> > sframe section in it.
>
> Indu told me on IRC that for GNU/Linux, SFrame will be an
> allocated,loaded section in elf files.
Yes it is, but is that a requirement for this interface? I just don't want
to add requirements based on how thing currently work if they are not
needed.
>
> I'm planning to add optional fields (build id, debug link) that are
> ELF-specific. I therefore think it's best that we keep this specific as
> registration of an elf file.
Here's a hypothetical, what if for some reason (say having the sframe
sections outside of the elf file) that the linker shares that?
For instance, if the sframe sections are downloaded separately as a
separate package for a given executable (to make it not mandatory for an
install), the linker could be smart enough to see that they exist in some
special location and then pass that to the kernel. In other words, this is
option is specific for sframe and not ELF. I rather call it by that.
>
> If there are other file types in the future that happen to contain an
> sframe section (but are not ELF), then we can simply add a new label to
> enum code_opt.
>
> >
> >>
> >> sys_codectl(2)
> >> =================
> >>
> >> * arg0: unsigned int @option:
> >>
> >> /* Additional labels can be added to enum code_opt, for extensibility. */
> >>
> >> enum code_opt {
> >> CODE_REGISTER_ELF,
> >
> > Perhaps the above should be: CODE_REGISTER_SFRAME,
> >
> > as currently SFrame is read only via files.
>
> As I pointed out above, on GNU/Linux, sframe is always an allocated,loaded
> ELF section. AFAIU, your comment implies that we'd want to support other scenarios
> where the sframe is in files outside of elf binary sframe sections. Can you
> expand on the use-case you have for this, or is it just for future-proofing ?
Heh, I just did above (before reading this). But yeah, it could be. As I
mentioned above, this is not about ELF files. Sframes just happen to be in
an ELF file. CODE_REGISTER_ELF sounds like this is for doing special
actions to an ELF file, when in reality it is doing special actions to tell
the kernel this is an sframe table. It just happens that sframes are in
ELF. Let's call it for what it is used for.
>
> >
> >> CODE_REGISTER_JIT,
> >
> > From our other conversations, JIT will likely be a completely different
> > format than SFRAME, so calling it just JIT should be fine.
>
> OK
>
> >
> >
> >> CODE_UNREGISTER,
> >
> > I wonder if this should be the first enum. That is, "0" is to unregister.
> >
> > That way, all non-zero options will be for what is being registered, and
> > "0" is for unregistering any of them.
>
> Good idea, I'll do that.
>
> >
> >
> >> };
> >>
> >> * arg1: void * @info
> >>
> >> /* if (@option == CODE_REGISTER_ELF) */
> >>
> >> /*
> >> * text_start, text_end, sframe_start, sframe_end allow unwinding of the
> >> * call stack.
> >> *
> >> * elf_start, elf_end, pathname, and either build_id or debug_link allows
> >> * mapping instruction pointers to file, symbol, offset, and source file
> >> * location.
> >> */
> >> struct code_elf_info {
> >> : __u64 elf_start;
> >> __u64 elf_end;
> >
> > Perhaps:
> >
> > __u64 file_start;
> > __u64 file_end;
> >
> > ?
> >
> > And call it "struct code_sframe_info"
> >
> >> __u64 text_start;
> >> __u64 text_end;
> >
> >> __u64 sframe_start;
> >> __u64 sframe_end;
> >
> > What is the above "sframe" for?
Still wondering what the above is for.
> >
> >> __u64 pathname; /* char *, NULL if unavailable. */
> >>
> >> __u64 build_id; /* char *, NULL if unavailable. */
> >> __u64 debug_link_pathname; /* char *, NULL if unavailable. */
> >
> > Maybe just list the above three as "optional" ?
>
> This is what I had in mind with "NULL if unavailable", but I can clarify
> them as being "optional" in the comment.
>
> Do you envision that the sizeof(struct code_elf_info) could be smaller
> and not include the optional fields, or just specifying them as NULL if
> unavailable is enough ?
Hmm, are we going to allow this structure to expand? Should we give it a
size. Or just state that different options could have different sizes (and
make this more of a union than a structure).
>
> >
> > It may be available, but the implementer just doesn't want to implement it.
> >
> >> __u32 build_id_len;
> >> __u32 debug_link_crc;
> >> };
> >>
> >>
> >> /* if (@option == CODE_REGISTER_JIT) */
> >>
> >> /*
> >> * Registration of sorted JIT unwind table: The reserved memory area is
> >> * of size reserved_len. Userspace increases used_len as new code is
> >> * populated between text_start and text_end. This area is populated in
> >> * increasing address order, and its ABI requires to have no overlapping
> >> * fre. This fits the common use-case where JITs populate code into
> >> * a given memory area by increasing address order. The sorted unwind
> >> * tables can be chained with a singly-linked list as they become full.
> >> * Consecutive chained tables are also in sorted text address order.
> >> *
> >> * Note: if there is an eventual use-case for unsorted jit unwind table,
> >> * this would be introduced as a new "code option".
> >> */
> >>
> >> struct code_jit_info {
> >> __u64 text_start; /* text_start >= addr */
> >> __u64 text_end; /* addr < text_end */
> >> __u64 unwind_head; /* struct code_jit_unwind_table * */
> >> };
> >>
> >> struct code_jit_unwind_fre {
> >> /*
> >> * Contains info similar to sframe, allowing unwind for a given
> >> * code address range.
> >> */
> >> __u32 size;
> >> __u32 ip_off; /* offset from text_start */
> >> __s32 cfa_off;
> >> __s32 ra_off;
> >> __s32 fp_off;
> >> __u8 info;
> >> };
> >>
> >> struct code_jit_unwind_table {
> >> __u64 reserved_len;
> >> __u64 used_len; /*
> >> * Incremented by userspace (store-release), read by
> >> * the kernel (load-acquire).
> >> */
> >> __u64 next; /* Chain with next struct code_jit_unwind_table. */
> >> struct code_jit_unwind_fre fre[];
> >> };
> >
> > I wonder if we should avoid the "jit" portion completely for now until we
> > know what exactly we need.
>
> I don't want to spend too much discussion time on the jit portion at this stage,
> but I think it's good to keep this in mind so we come up with an ABI that will
> naturally extend to cover that use case. I favor keeping the JIT portion in these
> discussions but not implement it initially.
As long as the structure is flexible to handle this. We could even add the
JIT enum, but return -EINVAL (or whatever) if it is used to state that it's
not currently implemented.
-- Steve
Powered by blists - more mailing lists