[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzbThkFEb88Jj8pMxuMkHeWmJNMnRT5j6SA4HmN4g+-gJA@mail.gmail.com>
Date: Wed, 2 Apr 2025 08:56:55 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, Andrii Nakryiko <andrii@...nel.org>,
linux-trace-kernel@...r.kernel.org, peterz@...radead.org, mingo@...nel.org,
bpf@...r.kernel.org, linux-kernel@...r.kernel.org, kernel-team@...a.com,
oleg@...hat.com, brauner@...nel.org, glider@...gle.com, mhiramat@...nel.org,
mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org
Subject: Re: [PATCH] exit: add trace_task_exit() tracepoint before current->mm
is reset
On Wed, Apr 2, 2025 at 12:20 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Tue 01-04-25 15:04:11, Andrii Nakryiko wrote:
> > On Tue, Apr 1, 2025 at 2:31 PM Steven Rostedt <rostedt@...dmis.org> wrote:
> > >
> > > On Tue, 1 Apr 2025 11:40:21 -0700
> > > Andrii Nakryiko <andrii@...nel.org> wrote:
> > >
> > > Hi Andrii,
> > >
> > > > It is useful to be able to access current->mm to, say, record a bunch of
> > > > VMA information right before the task exits (e.g., for stack
> > > > symbolization reasons when dealing with short-lived processes that exit
> > > > in the middle of profiling session). We currently do have
> > > > trace_sched_process_exit() in the exit path, but it is called a bit too
> > > > late, after exit_mm() resets current->mm to NULL, which makes it
> > > > unsuitable for inspecting and recording task's mm_struct-related data
> > > > when tracing process lifetimes.
> > >
> > > My fear of adding another task exit trace event is that it will get a
> > > bit confusing as that we now have trace_sched_process_exit() and also
> > > trace_task_exit() with slightly different semantics.
> > >
> > > How about adding a trace_exit_mm()? Add that to the exit_mm() code?
> >
> > This is kind of the worst of both worlds, no? We still have a new
> > tracepoint, but this one can't tell if it's a `group_dead` situation
> > or not... I can pass group_dead into exit_mm(), but it will be just
> > for the sake of that new tracepoint.
>
> Is it important to tell the difference between thread and the
> whole process group exiting?
Yes, it often is important. In the sense that process group exiting
would trigger extra information gathering/aggregation/sending compared
to just process (thread) existing. Both are important to track, but
it's useful to be able to differentiate.
>
> Please keep in mind that even group exit doesn't really imply the mm is
> going away (clone allows CLONE_VM without CLONE_SIGNAL - i.e. mm could
> be shared outside of thread group).
Yep, I realize that, and as Steven said, for cases like that a
dedicated mm-specific tracepoint might be useful. But I'm not really
aware of use cases caring about mm_struct itself, in isolation. It's
all usually in the context of thread/process exit, and mm-related
information is one of a bunch of extra information that's useful at
that point. It's just so that with current sched_process_exit()
tracepoint placement we (e.g., BPF program) gets control a bit too
late.
But it seems like everyone is OK just shifting sched_process_exit() to
before exit_mm(), which should cover the use cases I had in mind. We
can always add mm-specific tracepoint when the need arises.
Thanks everyone for discussion and suggestions!
> --
> Michal Hocko
> SUSE Labs
Powered by blists - more mailing lists