[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNOXbWM6seCS9728D+ZXUrF2u+YTCaC7q4ZkHFVM2P+t7Q@mail.gmail.com>
Date: Wed, 10 Apr 2024 09:54:50 +0200
From: Marco Elver <elver@...gle.com>
To: Masami Hiramatsu <mhiramat@...nel.org>
Cc: Steven Rostedt <rostedt@...dmis.org>, Eric Biederman <ebiederm@...ssion.com>,
Kees Cook <keescook@...omium.org>, Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [PATCH] tracing: Add new_exec tracepoint
On Wed, 10 Apr 2024 at 01:54, Masami Hiramatsu <mhiramat@...nel.org> wrote:
>
> On Tue, 9 Apr 2024 16:45:47 +0200
> Marco Elver <elver@...gle.com> wrote:
>
> > On Tue, 9 Apr 2024 at 16:31, Steven Rostedt <rostedt@...dmis.org> wrote:
> > >
> > > On Mon, 8 Apr 2024 11:01:54 +0200
> > > Marco Elver <elver@...gle.com> wrote:
> > >
> > > > Add "new_exec" tracepoint, which is run right after the point of no
> > > > return but before the current task assumes its new exec identity.
> > > >
> > > > Unlike the tracepoint "sched_process_exec", the "new_exec" tracepoint
> > > > runs before flushing the old exec, i.e. while the task still has the
> > > > original state (such as original MM), but when the new exec either
> > > > succeeds or crashes (but never returns to the original exec).
> > > >
> > > > Being able to trace this event can be helpful in a number of use cases:
> > > >
> > > > * allowing tracing eBPF programs access to the original MM on exec,
> > > > before current->mm is replaced;
> > > > * counting exec in the original task (via perf event);
> > > > * profiling flush time ("new_exec" to "sched_process_exec").
> > > >
> > > > Example of tracing output ("new_exec" and "sched_process_exec"):
> > >
> > > How common is this? And can't you just do the same with adding a kprobe?
> >
> > Our main use case would be to use this in BPF programs to become
> > exec-aware, where using the sched_process_exec hook is too late. This
> > is particularly important where the BPF program must stop inspecting
> > the user space's VM when the task does exec to become a new process.
>
> Just out of curiousity, would you like to audit that the user-program
> is not malformed? (security tracepoint?) I think that is an interesting
> idea. What kind of information you need?
I didn't have that in mind. If the BPF program reads (or even writes)
to user space memory, it must stop doing so before current->mm is
switched, otherwise it will lead to random results or memory
corruption. The new process may reallocate the memory that we want to
inspect, but the user space process must explicitly opt in to being
inspected or being manipulated. Just like the kernel "flushes" various
old state on exec since it's becoming a new process, a BPF program
that has per-process state needs to do the same.
Powered by blists - more mailing lists