[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170108213111.7f78e0d665783275ffd739bc@kernel.org>
Date: Sun, 8 Jan 2017 21:31:11 +0900
From: Masami Hiramatsu <mhiramat@...nel.org>
To: Masami Hiramatsu <mhiramat@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
linux-kernel@...r.kernel.org,
Ananth N Mavinakayanahalli <ananth@...ux.vnet.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
"H . Peter Anvin" <hpa@...or.com>,
Andrey Konovalov <andreyknvl@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes'
insn-slots as kernel text area
Hello,
On Sun, 8 Jan 2017 13:22:42 +0900
Masami Hiramatsu <mhiramat@...nel.org> wrote:
> > > Would you have any good benchmark to measure it?
> >
> > Not trivially so; what I did was cobble together a debugfs file that
> > measures the average of the PMI time in perf_sample_event_took(), and a
> > module that has a 10 deep callchain around a while(1) loop. Then perf
> > record with callchains for a few seconds.
> >
> > Generating the callchain does the unwinder thing and ends up calling
> > is_kernel_address() lots.
> >
> > The case I worked on was 0 modules vs 100+ modules in a distro build,
> > which was fairly obviously painful back then, since
> > is_module_text_address() used a linear lookup.
> >
> > I'm not sure I still have all those bits, but I can dig around a bit if
> > you're interested.
>
> Hmm, I tried to do similar thing (make a test module which has a loop with
> 10 deep recursive call and save stack-trace) on kvm, but got very unstable
> results.
> Maybe it needs to run on bare-metal machine.
On bare-metal machine, I tried to use attached module to measure
actual performance degrade.
benchmarks with no kprobeq
$ lsmod | wc -l
123
$ for i in {0..10}; do sudo insmod ./unwind_bench.ko && sudo rmmod unwind_bench;done
[ 245.639150] Start benchmarking.
[ 245.639353] End benchmarking. Elapsed: 620579 cycles, loop count:1000
[ 277.481621] Start benchmarking.
[ 277.481821] End benchmarking. Elapsed: 609523 cycles, loop count:1000
[ 302.210924] Start benchmarking.
[ 302.211077] End benchmarking. Elapsed: 441272 cycles, loop count:1000
[ 302.242931] Start benchmarking.
[ 302.243070] End benchmarking. Elapsed: 401676 cycles, loop count:1000
[ 302.270682] Start benchmarking.
[ 302.270762] End benchmarking. Elapsed: 242776 cycles, loop count:1000
[ 302.294656] Start benchmarking.
[ 302.294737] End benchmarking. Elapsed: 247352 cycles, loop count:1000
[ 302.318517] Start benchmarking.
[ 302.318600] End benchmarking. Elapsed: 255048 cycles, loop count:1000
[ 302.344519] Start benchmarking.
[ 302.344599] End benchmarking. Elapsed: 242392 cycles, loop count:1000
[ 302.369450] Start benchmarking.
[ 302.369530] End benchmarking. Elapsed: 242628 cycles, loop count:1000
[ 302.399517] Start benchmarking.
[ 302.399605] End benchmarking. Elapsed: 268736 cycles, loop count:1000
[ 302.423508] Start benchmarking.
[ 302.423587] End benchmarking. Elapsed: 242068 cycles, loop count:1000
[ 302.451282] Start benchmarking.
[ 302.451361] End benchmarking. Elapsed: 242272 cycles, loop count:1000
[ 302.480656] Start benchmarking.
[ 302.480736] End benchmarking. Elapsed: 243552 cycles, loop count:1000
benchmarks with disabled 106 kprobes (only 89 probes are optimized)
# grep "[tT] audit_*" /proc/kallsyms | while read a b c d; do echo p $c+5 >> kprobe_events ;done
# cat ../kprobes/list | wc -l
106
# echo 1 > events/enable
# grep OPTIMIZED ../kprobes/list | wc -l
89
[ 1037.065151] Start benchmarking.
[ 1037.065500] End benchmarking. Elapsed: 1067196 cycles, loop count:1000
[ 1037.113893] Start benchmarking.
[ 1037.114766] End benchmarking. Elapsed: 2667646 cycles, loop count:1000
[ 1037.149251] Start benchmarking.
[ 1037.149382] End benchmarking. Elapsed: 402500 cycles, loop count:1000
[ 1037.180287] Start benchmarking.
[ 1037.180368] End benchmarking. Elapsed: 246000 cycles, loop count:1000
[ 1037.208416] Start benchmarking.
[ 1037.208503] End benchmarking. Elapsed: 266768 cycles, loop count:1000
[ 1037.236328] Start benchmarking.
[ 1037.236407] End benchmarking. Elapsed: 242396 cycles, loop count:1000
[ 1037.270207] Start benchmarking.
[ 1037.270288] End benchmarking. Elapsed: 246384 cycles, loop count:1000
[ 1037.302086] Start benchmarking.
[ 1037.302174] End benchmarking. Elapsed: 266796 cycles, loop count:1000
[ 1037.330165] Start benchmarking.
[ 1037.330245] End benchmarking. Elapsed: 242888 cycles, loop count:1000
[ 1037.358207] Start benchmarking.
[ 1037.358287] End benchmarking. Elapsed: 243288 cycles, loop count:1000
[ 1037.388134] Start benchmarking.
[ 1037.388213] End benchmarking. Elapsed: 240936 cycles, loop count:1000
FYI, I also tried same benchmark without patch, but it was very unstable.
Benchmarks without patch:
[ 4583.832368] Start benchmarking.
[ 4583.832943] End benchmarking. Elapsed: 1758512 cycles, loop count:1000
[ 4712.636096] Start benchmarking.
[ 4712.636712] End benchmarking. Elapsed: 1881112 cycles, loop count:1000
[ 4712.668672] Start benchmarking.
[ 4712.669254] End benchmarking. Elapsed: 1777584 cycles, loop count:1000
[ 4712.696150] Start benchmarking.
[ 4712.696925] End benchmarking. Elapsed: 2324292 cycles, loop count:1000
[ 4712.727850] Start benchmarking.
[ 4712.727988] End benchmarking. Elapsed: 422768 cycles, loop count:1000
[ 4712.756108] Start benchmarking.
[ 4712.756684] End benchmarking. Elapsed: 1755676 cycles, loop count:1000
[ 4712.780775] Start benchmarking.
[ 4712.781347] End benchmarking. Elapsed: 1749872 cycles, loop count:1000
[ 4712.811910] Start benchmarking.
[ 4712.812510] End benchmarking. Elapsed: 1825972 cycles, loop count:1000
[ 4712.844676] Start benchmarking.
[ 4712.845270] End benchmarking. Elapsed: 1815532 cycles, loop count:1000
[ 4712.876057] Start benchmarking.
[ 4712.876203] End benchmarking. Elapsed: 421700 cycles, loop count:1000
[ 4712.908149] Start benchmarking.
[ 4712.908741] End benchmarking. Elapsed: 1777924 cycles, loop count:1000
[ 4712.943254] Start benchmarking.
[ 4712.943823] End benchmarking. Elapsed: 1754012 cycles, loop count:1000
Thank you,
--
Masami Hiramatsu <mhiramat@...nel.org>
View attachment "unwind_bench.c" of type "text/plain" (993 bytes)
Powered by blists - more mailing lists