[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47a43d27-7eac-4f88-a783-afdd3a97bb11@efficios.com>
Date: Wed, 2 Jul 2025 15:12:45 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, bpf@...r.kernel.org, x86@...nel.org,
Masami Hiramatsu <mhiramat@...nel.org>, Josh Poimboeuf
<jpoimboe@...nel.org>, Ingo Molnar <mingo@...nel.org>,
Jiri Olsa <jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Andrii Nakryiko <andrii@...nel.org>,
Indu Bhagat <indu.bhagat@...cle.com>, "Jose E. Marchesi" <jemarch@....org>,
Beau Belgrave <beaub@...ux.microsoft.com>, Jens Remus
<jremus@...ux.ibm.com>, Andrew Morton <akpm@...ux-foundation.org>,
Jens Axboe <axboe@...nel.dk>, Florian Weimer <fweimer@...hat.com>
Subject: Re: [PATCH v12 06/14] unwind_user/deferred: Add deferred unwinding
interface
On 2025-07-02 15:05, Steven Rostedt wrote:
> On Wed, 2 Jul 2025 14:47:10 -0400
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
>>
>> AFAIR, one of the goals here is to save the cookie into the trace
>> to allow trace post-processing to link the event triggering the
>> unwinding with the deferred unwinding data.
>>
>> In order to make the trace analysis results reliable, we'd like
>> to avoid the following causes of uncertainty, which would
>> mistakenly cause the post-processing analysis to associate
>> a stack trace with the wrong event:
>>
>> - Thread ID re-use (exit + clone/fork),
>> - Thread migration,
>> - Events discarded (e.g. buffer full) causing missing
>> thread lifetime events or missing unwind-related events.
>>
>> Unless I'm missing something, the per-thread counter would have
>> issues with thread ID re-use during the trace lifetime.
>
> But you are missing one more thing that the trace can use, and that's
> the time sequence. As soon as the same thread has a new id you can
> assume all the older user space traces are not applicable for any new
> events for that thread, or any other thread with the same thread ID.
In order for the scheme you describe to work, you need:
- instrumentation of task lifetime (exit/fork+clone),
- be sure that the events related to that instrumentation were not
dropped.
I'm not sure about ftrace, but in LTTng enabling instrumentation of
task lifetime is entirely up to the user.
And even if it's enabled, events can be discarded (e.g. buffer full).
>
> Thus the only issue that can truly be a problem is if you have missed
> events where thread id wraps around. I guess that could be possible if
> a long running task finally exits and it's thread id is reused
> immediately. Is that a common occurrence?
You just need a combination of thread ID re-use and either no
instrumentation of task lifetime or events discarded to trigger this.
Even if it's not so frequent, at large scale and in production, I
suspect that this will happen quite often.
You don't even need the task IDs to be re-used very quickly for this to
be an issue.
Thanks,
Mathieu
>
> -- Steve.
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists