[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250722151127.0b64d3b6@batman.local.home>
Date: Tue, 22 Jul 2025 15:11:27 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
bpf@...r.kernel.org, x86@...nel.org, Masami Hiramatsu
<mhiramat@...nel.org>, Josh Poimboeuf <jpoimboe@...nel.org>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, Jiri Olsa
<jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>, Thomas Gleixner
<tglx@...utronix.de>, Andrii Nakryiko <andrii@...nel.org>, Indu Bhagat
<indu.bhagat@...cle.com>, "Jose E. Marchesi" <jemarch@....org>, Beau
Belgrave <beaub@...ux.microsoft.com>, Jens Remus <jremus@...ux.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, Andrew Morton
<akpm@...ux-foundation.org>, Jens Axboe <axboe@...nel.dk>, Florian Weimer
<fweimer@...hat.com>, Sam James <sam@...too.org>, Brian Robbins
<brianrob@...rosoft.com>, Elena Zannoni <elena.zannoni@...cle.com>
Subject: Re: [RFC] New codectl(2) system call for sframe registration
Florian, You may want to read this email as there's some question about
dynamic linking.
On Tue, 22 Jul 2025 14:26:44 -0400
Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> >
> > I'm looking for a mapping between already loaded text memory to how to
> > unwind it that will be in an sframe format somewhere on disk.
>
> OK, so what you have in mind is the compressed sframe use-case.
>
> Ideally, for the compressed sframe use-case I suspect we'd want to do
> lazy on demand decompression which could decompress only the parts that
> are needed for the unwind, rather than expand everything in memory.
>
> Pointing the kernel to a file/offset on disk is rather different than
> the current ELF sframe section scenario, where is it allocated,loaded
> into the process' address space. I suspect we would want to cover this
> with a future new code_opt enum label.
The sframe program header is of type PT_GNU_SFRAME and not PT_LOAD so
the linker will not be loading it. The code in the kernel has to do
something special with this section. It's not automatic.
So yes, I never had any expectation that the dynamic linker would even
load sframes into memory. It would simply tell the kernel where to find
it and it will load it.
> >
> > Yes, but we are not registering ELF. We are registering how to unwind
> > something with sframes. If it's not sframes we are registering, what is
> > it?
>
> I am thinking of sframes as one of the properties of an ELF executable.
> So from my perspective we are registering an ELF file with various
> properties, one of which is its sframe section.
That wasn't what I was thinking.
>
> But I think I get where you are getting at: if we define the sframe
> registration for ELF as sframe_start, sframe_end, then it forgoes
> approaches where sframe is provided through other means, such as
> pathname and offset, which would be useful for the compressed sframe
> use-case.
>
> If system call overhead is not too much of an issue at library load,
> then we could break this down into multiple system calls, e.g.
> eventually:
>
> codectl(CODE_REGISTER_SFRAME, /* provide sframe start + end */ )
> codectl(CODE_REGISTER_ELF, /* provide elf-specific info such as build id */ )
IIRC, and Florian (who has been Cc'd) can correct me if I'm wrong,
dynamic file loading is quite a slow process and a few extra system
calls isn't going to show up outside the noise.
> > The systemcall is to let the dynamic linker know where the kernel can
> > find the sframes for newly loaded text.
>
> I am saying this is a "new" model because the current sframe section is
> allocated,loaded, which means it is present in userspace memory, so it
> seems rather logical to delimit this area with pointers to the start/end
> of that range.
But its the kernel that maps it into memory. I was expecting that the
kernel would map it again into memory just like it does with the ELF
file. I wasn't expecting the dynamic linker to.
> >
> > Actually, the sframe section shouldn't be mapped into user space
> > memory. The kernel will be doing that, not the linker.
>
> AFAIU, that's not how the sframe section works today. It's allocated,loaded.
> So userspace maps the section into its address space, and the kernel takes
> the page faults when it needs to load its content.
Yes, but the kernel maps it. I wasn't expecting the user space dynamic
linker to map it. I was expecting the system call to simply say "here's
where the sframe section is in this file" and the kernel would take
care of the rest.
>
>
> > I would say that
> > the system call can give a hint of where it would like it mapped, but
> > it should allow the kernel to decide where to map it as the user space
> > code doesn't care where it gets mapped.
>
> AFAIU currently the dynamic loader maps the section, not the kernel.
You mean the prctl()?
I haven't looked to deep into that systemcall. It may do that
currently. I'm just thinking what is the best way to do this. I guess
we should ask Florian which is best for the dynamic linker. If it
should map it in, or if the kernel should, with thinking about a
compressed format in mind as well.
>
> >
> > In the future, if we wants to compress the sframe section, it will not
> > even be a loadable ELF section. But the system call can tell the
> > kernel: "there's a sframe compressed section at this offset/size in
> > this file" for this text address range and then the kernel will do the
> > rest.
>
> I would see this compressed side-file handled entirely from the kernel
> (not mapped in userspace) as a new enum code_opt option.
Yes, it would likely be a new emum.
But if the dynamic linker has already mapped the sframe into memory and
giving it to the kernel, then it is even less an "elf" file. It's
simply mapping a sframe section in memory with some text in memory. The
way the dynamic linker mapped it will still do everything as normal.
>
> >
> >>
> >> Am I unknowingly adding some kind of redundancy here ?
> >>
> >
> > Maybe. This systemcall was to add unwinding information for the kernel.
> > It looks like you are having it be much more than that. I'm not against
> > that, but that should only be for extensions, and currently, this is
> > supposed to only make sframes work.
>
> I agree that if we state that "elf" registration has sframe_start/end
> as a mean to express sframe, then we are stuck with a model where userspace
> needs to map the section in its memory. Considering that you want to
> express different models where a filename and offset is provided to the
> kernel instead, then it makes sense to make the registration more specific.
>
> The downside would be that we may have to do more than one system call if we
> want to register more than one "aspect", e.g. sframe vs elf build-id.
>
> I think the overhead of a single vs a few system calls is an important
> aspect to consider. If the overhead of a few more system calls at library
> load does not matter too much, then we should go for the more specific
> registration. I have no clue whether that overhead matters in practice though.
If the linker needs to map it, it is already doing lots of systemcalls
to accomplish that ;-)
-- Steve
Powered by blists - more mailing lists