[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1265188475-23509-1-git-send-regression-fweisbec@gmail.com>
Date: Wed, 3 Feb 2010 10:14:24 +0100
From: Frederic Weisbecker <fweisbec@...il.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: LKML <linux-kernel@...r.kernel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Paul Mackerras <paulus@...ba.org>,
Hitoshi Mitake <mitake@....info.waseda.ac.jp>,
Li Zefan <lizf@...fujitsu.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Masami Hiramatsu <mhiramat@...hat.com>,
Jens Axboe <jens.axboe@...cle.com>
Subject: [RFC GIT PULL] perf/trace/lock optimization/scalability improvements
Hi,
There are many things that happen in this patchset, treating
different problems:
- remove most of the string copy overhead in fast path
- open the way for lock class oriented profiling (as
opposite to lock instance profiling. Both can be useful
in different ways).
- remove the buffers muliplexing (less contention)
- event injection support
- remove violent lock events recursion (only 2 among 3, the remaining
one is detailed below).
Some differences, by running:
perf lock record perf sched pipe -l 100000
Before the patchset:
Total time: 91.015 [sec]
910.157300 usecs/op
1098 ops/sec
After this patchset applied:
Total time: 43.706 [sec]
437.062080 usecs/op
2288 ops/sec
Although it's actually 50 secs after the very latest patch in this
series. It is supposed to bring more scalability (and I believe it
does on a box with more than two cpus, although I can't test).
But multiplexing the counters had a side effect: perf record has
only one buffer to eat and not 5 * NR_CPUS, which makes its job
a bit easier when we multiplex (at the cost of cpus contention of
course, but on my atom, the scalability gain is not very visible).
And also, after this odd patch:
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 98fd360..254b3d4 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3094,7 +3094,8 @@ static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)
if (event->parent)
event = event->parent;
- return task_pid_nr_ns(p, event->ns);
+ return p->pid;
}
We get:
Total time: 26.170 [sec]
261.707960 usecs/op
3821 ops/sec
Ie: 2x faster than this patchset, and more than 3x faster than
tip:/perf/core
This is because task_pid_nr_ns() takes a lock and creates
lock events recursion. We really need to fix that.
You can pull this patchset from:
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
perf/core
Thanks.
---
Frederic Weisbecker (11):
tracing: Add lock_class_init event
tracing: Introduce TRACE_EVENT_INJECT
tracing: Inject lock_class_init events on registration
tracing: Add lock class id in lock_acquire event
perf: New PERF_EVENT_IOC_INJECT ioctl
perf: Handle injection ioctl with trace events
perf: Handle injection iotcl for tracepoints from perf record
perf/lock: Add support for lock_class_init events
tracing: Remove the lock name from most lock events
tracing/perf: Fix lock events recursions in the fast path
perf lock: Drop the buffers multiplexing dependency
include/linux/ftrace_event.h | 6 +-
include/linux/lockdep.h | 4 +
include/linux/perf_event.h | 6 +
include/linux/tracepoint.h | 3 +
include/trace/define_trace.h | 6 +
include/trace/events/lock.h | 57 ++++--
include/trace/ftrace.h | 31 +++-
kernel/lockdep.c | 16 ++
kernel/perf_event.c | 47 ++++-
kernel/trace/trace_event_profile.c | 46 +++--
kernel/trace/trace_events.c | 3 +
tools/perf/builtin-lock.c | 345 ++++++++++++++++++++++++++++++++----
tools/perf/builtin-record.c | 9 +
13 files changed, 497 insertions(+), 82 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists