linux-kernel - Re: [PATCH] tracing: Trace instrumentation begin and end

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230322084834.37ed755e@gandalf.local.home>
Date:   Wed, 22 Mar 2023 08:48:34 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Linux Trace Kernel <linux-trace-kernel@...r.kernel.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Ross Zwisler <zwisler@...gle.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Miroslav Benes <mbenes@...e.cz>
Subject: Re: [PATCH] tracing: Trace instrumentation begin and end

On Wed, 22 Mar 2023 12:19:14 +0100
Thomas Gleixner <tglx@...utronix.de> wrote:

> Steven!
> 
> On Tue, Mar 21 2023 at 21:51, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rostedt@...dmis.org>
> > produces:
> >
> >  2)   0.764 us    |  exit_to_user_mode_prepare();
> >  2)               |  /* page_fault_user: address=0x7fadaba40fd8 ip=0x7fadaba40fd8 error_code=0x14 */
> >  2)   0.581 us    |  down_read_trylock();
> >
> > The "page_fault_user" event is not encapsulated around any function, which
> > means it probably triggered and went back to user space without any trace
> > to know how long that page fault took (the down_read_trylock() is likely to
> > be part of the page fault function, but that's besides the point).
> >
> > To help bring back the old functionality, two trace points are added. One
> > just after instrumentation begins, and one just before it ends. This way,
> > we can see all the time that the kernel can do something meaningful, and we
> > will trace it.  
> 
> Seriously? That's completely insane. Have you actually looked how many
> instrumentation_begin()/end() pairs are in the affected code pathes?
> 
> Obviously not. It's a total of _five_ for every syscall and at least
> _four_ for every interrupt/exception from user mode.
> 
> The number #1 design rule for instrumentation is to be as non-intrusive as
> possible and not to be as lazy as possible.

And it still is. It still uses nops when not enabled. I could even add a
config to only have this compiled in when the config is set, so that
production can disable it if it wants to.

Just in case it's not obvious:

	if (tracepoint_enabled(instrumentation_begin))
		call_trace_instrumentation_begin(ip, pip);

is equivalent to:

	if (static_key_false(&__tracepoint_instrumentation_begin.key))
		call_trace_instrumentation_begin(ip, pip);

We have trace points in preempt_enable/disable() I think that's *far* more
intrusive.

> 
> instrumentation_begin()/end() is solely meant for objtool validation and
> nothing else.
> 
> There are clearly less horrible ways to retrieve the #PF duration, no?

It's not just for #PF, that was just one example. I use to use function
graph tracing max_depth_count=1 to verify NO_HZ_FULL to make sure there's
no entry into the kernel. That doesn't work anymore. Even compat syscalls
are not traced.

I lost a kernel feature with the noinstr push and this is the closest that
comes to bringing it back. And the more we add noinstr, the more the kernel
becomes a black box again.

-- Steve