lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 16 Mar 2010 21:56:28 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Paul Mackerras <paulus@...ba.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	benh@...nel.crashing.org, linux-kernel@...r.kernel.org,
	anton@...ba.org, linuxppc-dev@...abs.org
Subject: Re: [PATCH] powerpc/perf_events: Implement
	perf_arch_fetch_caller_regs for powerpc

On Tue, Mar 16, 2010 at 02:22:13PM +1100, Paul Mackerras wrote:
> On Mon, Mar 15, 2010 at 10:04:54PM +0100, Frederic Weisbecker wrote:
> > On Mon, Mar 15, 2010 at 04:46:15PM +1100, Paul Mackerras wrote:
> 
> > >     14.99%            perf  [kernel.kallsyms]  [k] ._raw_spin_lock
> > >                       |
> > >                       --- ._raw_spin_lock
> > >                          |          
> > >                          |--25.00%-- .alloc_fd
> > >                          |          (nil)
> > >                          |          |          
> > >                          |          |--50.00%-- .anon_inode_getfd
> > >                          |          |          .sys_perf_event_open
> > >                          |          |          syscall_exit
> > >                          |          |          syscall
> > >                          |          |          create_counter
> > >                          |          |          __cmd_record
> > >                          |          |          run_builtin
> > >                          |          |          main
> > >                          |          |          0xfd2e704
> > >                          |          |          0xfd2e8c0
> > >                          |          |          (nil)
> > > 
> > > ... etc.
> > > 
> > > Signed-off-by: Paul Mackerras <paulus@...ba.org>
> > 
> > 
> > Cool!
> 
> By the way, I notice that gcc tends to inline the tracing functions,
> which means that by going up 2 stack frames we miss some of the
> functions.  For example, for the lock:lock_acquire event, we have
> _raw_spin_lock() -> lock_acquire() -> trace_lock_acquire() ->
> perf_trace_lock_acquire() -> perf_trace_templ_lock_acquire() ->
> perf_fetch_caller_regs() -> perf_arch_fetch_caller_regs().
> 
> But in the ppc64 kernel binary I just built, gcc inlined
> trace_lock_acquire in lock_acquire, and perf_trace_templ_lock_acquire
> in perf_trace_lock_acquire.  Given that perf_fetch_caller_regs is
> explicitly inlined, going up two levels from perf_fetch_caller_regs
> gets us to _raw_spin_lock, whereas I think you intended it to get us
> to trace_lock_acquire.  I'm not sure what to do about that - any
> thoughts?



Yeah I've indeed seen this, and the problem is especially
the fact perf_trace_templ_lock_acquire may or may not be
inlined.

It is used for trace events that use the TRACE_EVENT_CLASS
thing. We define a pattern of event structure that is shared
among several events.

For example event A and event B share perf_trace_templ_foo.
Both will have a different perf_trace_blah but those
perf_trace_blah will both call the same perf_trace_templ_foo(),
in this case, it won't be inlined.

Events that don't share a pattern will have their
perf_trace_templ inlined, because there will be an exclusive 1:1
relationship between both.

The rewind of 2 is well suited for events sharing a pattern, ip
will match the right event source, and not one of its callers.

Unfortunately, the others are more unlucky.
I didn't mind much about this yet because it  had no bad effect
on lock events. Quite the opposite actually. It's not very interesting
to have lock_acquire as the event source unless you have a callchain.

If you have no callchain, you'll see a lot of such in perf report:

sym1	lock_aquire
sym2	lock_acquire
sym3	lock_acquire

What you want here is the function that called lock_acquire.

But if you have a callchain it's fine, because you have the nature
of the event (lock_aquire) and the origin as well.

That said, lock events are an exception where the mistake
has a lucky result. Other inlined events are harmed as we lose
their most important caller. So I'm going to fix that.

I can just fetch the regs from perf_trace_foo() and pass them
to perf_trace_templ_foo() and here we are.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ