[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230322084834.37ed755e@gandalf.local.home>
Date: Wed, 22 Mar 2023 08:48:34 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
Linux Trace Kernel <linux-trace-kernel@...r.kernel.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Ross Zwisler <zwisler@...gle.com>,
Joel Fernandes <joel@...lfernandes.org>,
"Paul E. McKenney" <paulmck@...nel.org>,
Miroslav Benes <mbenes@...e.cz>
Subject: Re: [PATCH] tracing: Trace instrumentation begin and end
On Wed, 22 Mar 2023 12:19:14 +0100
Thomas Gleixner <tglx@...utronix.de> wrote:
> Steven!
>
> On Tue, Mar 21 2023 at 21:51, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@...dmis.org>
> > produces:
> >
> > 2) 0.764 us | exit_to_user_mode_prepare();
> > 2) | /* page_fault_user: address=0x7fadaba40fd8 ip=0x7fadaba40fd8 error_code=0x14 */
> > 2) 0.581 us | down_read_trylock();
> >
> > The "page_fault_user" event is not encapsulated around any function, which
> > means it probably triggered and went back to user space without any trace
> > to know how long that page fault took (the down_read_trylock() is likely to
> > be part of the page fault function, but that's besides the point).
> >
> > To help bring back the old functionality, two trace points are added. One
> > just after instrumentation begins, and one just before it ends. This way,
> > we can see all the time that the kernel can do something meaningful, and we
> > will trace it.
>
> Seriously? That's completely insane. Have you actually looked how many
> instrumentation_begin()/end() pairs are in the affected code pathes?
>
> Obviously not. It's a total of _five_ for every syscall and at least
> _four_ for every interrupt/exception from user mode.
>
> The number #1 design rule for instrumentation is to be as non-intrusive as
> possible and not to be as lazy as possible.
And it still is. It still uses nops when not enabled. I could even add a
config to only have this compiled in when the config is set, so that
production can disable it if it wants to.
Just in case it's not obvious:
if (tracepoint_enabled(instrumentation_begin))
call_trace_instrumentation_begin(ip, pip);
is equivalent to:
if (static_key_false(&__tracepoint_instrumentation_begin.key))
call_trace_instrumentation_begin(ip, pip);
We have trace points in preempt_enable/disable() I think that's *far* more
intrusive.
>
> instrumentation_begin()/end() is solely meant for objtool validation and
> nothing else.
>
> There are clearly less horrible ways to retrieve the #PF duration, no?
It's not just for #PF, that was just one example. I use to use function
graph tracing max_depth_count=1 to verify NO_HZ_FULL to make sure there's
no entry into the kernel. That doesn't work anymore. Even compat syscalls
are not traced.
I lost a kernel feature with the noinstr push and this is the closest that
comes to bringing it back. And the more we add noinstr, the more the kernel
becomes a black box again.
-- Steve
Powered by blists - more mailing lists