[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d1e08929fb74938998bd9aa2e370424@AcuMS.aculab.com>
Date: Tue, 6 Sep 2022 03:30:33 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Tiezhu Yang' <yangtiezhu@...ngson.cn>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
"Arnaldo Carvalho de Melo" <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
"Alexander Shishkin" <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>
CC: "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 2/3] perf bench syscall: Add close syscall benchmark
From: Tiezhu Yang
> Sent: 06 September 2022 04:06
>
> This commit adds a simple close syscall benchmark, more syscall
> benchmarks can be added in the future.
>
...
>
> Signed-off-by: Tiezhu Yang <yangtiezhu@...ngson.cn>
> ---
> tools/perf/bench/bench.h | 1 +
> tools/perf/bench/syscall.c | 11 +++++++++++
> tools/perf/builtin-bench.c | 1 +
> 3 files changed, 13 insertions(+)
>
> diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
> index 6cefb43..916cd47 100644
...
> diff --git a/tools/perf/bench/syscall.c b/tools/perf/bench/syscall.c
> index 746fd71..058394b 100644
> --- a/tools/perf/bench/syscall.c
> +++ b/tools/perf/bench/syscall.c
> @@ -46,6 +46,9 @@ static int bench_syscall_common(int argc, const char **argv, int syscall)
> case __NR_getppid:
> getppid();
> break;
> + case __NR_close:
> + close(dup(0));
Not really a close() test.
The dup(0) call will be significant and may take longer.
I'm also not sure that using the syscall number for the
test number is entirely sensible.
One thing I have measured in the past is the time taken
to read in an iov[] array.
This can be measured quite nicely using writev() on /dev/null.
(No copies ever happen and iov_iter() is never used.)
But you need to test a few different iov lengths.
I'm also not 100% sure how accurate/repeatable/sensible it
is to use the 'wall clock time' for 1000000 iterations.
A lot of modern cpu will dynamically change the clock speed
underneath you and other system code (like ethernet receive)
can badly perturb the results.
What you really want to use is a TSC - but they are now
useless for counting cycles.
The x86 performance counters to have a cycle counter.
I've used that to measure single calls of both library
functions and system calls.
Just 10 iterations give a 'cold cache' value and some
very consistent counts (remove real outliers).
Indeed the fastest value is really the right one.
For functions like the IP checksum you can even
show that the code is executing in the expected number
of clock cycles (usually limited by memory reads).
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists