linux-kernel - Re: [PATCH v2] Perf Bench: Locking Microbenchmark

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141121155706.GB18625@kernel.org>
Date:	Fri, 21 Nov 2014 12:57:06 -0300
From:	Arnaldo Carvalho de Melo <acme@...nel.org>
To:	Tuan Bui <tuan.d.bui@...com>
Cc:	linux-kernel@...r.kernel.org, dbueso@...e.de,
	a.p.zijlstra@...llo.nl, paulus@...ba.org, artagnon@...il.com,
	jolsa@...hat.com, dvhart@...ux.intel.com,
	Aswin Chandramouleeswaran <aswin@...com>,
	Jason Low <jason.low2@...com>, akpm@...ux-foundation.org,
	mingo@...nel.org
Subject: Re: [PATCH v2] Perf Bench: Locking Microbenchmark

Em Thu, Nov 20, 2014 at 11:06:05AM -0800, Tuan Bui escreveu:
> Subject: [PATCH] Perf Bench: Locking Microbenchmark
> 
> In response to this thread https://lkml.org/lkml/2014/2/11/93, this is
> a micro benchmark that stresses locking contention in the kernel with
> creat(2) system call by spawning multiple processes to spam this system
> call.  This workload generate similar results and contentions in AIM7
> fserver workload but can generate outputs within seconds.
> 
> With the creat system call the contention vary on what locks are used
> in the particular file system. I have ran this benchmark only on ext4
> and xfs file system.
> 
> Running the creat workload on ext4 show contention in the mutex lock
> that is used by ext4_orphan_add() and ext4_orphan_del() to add or delete
> an inode from the list of inodes. At the same time running the creat
> workload on xfs show contention in the spinlock that is used by
> xsf_log_commit_cil() to commit a transaction to the Committed Item List.
> 
> Here is a comparison of this benchmark with AIM7 running fserver workload
> at 500-1000 users along with a perf trace running on ext4 file system.
> 
> Test machine is a 8-sockets 80 cores Westmere system HT-off on v3.17-rc6.
> 
> 	AIM7		AIM7		perf-bench	perf-bench
> Users	Jobs/min	Jobs/min/child	Ops/sec		Ops/sec/child
> 500	119668.25	239.34		104249		208
> 600	126074.90	210.12		106136		176
> 700	128662.42	183.80		106175		151
> 800	119822.05	149.78		106290		132
> 900	106150.25	117.94		105230		116
> 1000	104681.29	104.68		106489		106
> 
> Perf trace for AIM7 fserver:

I will rename this from "Perf trace for AIM7 fserver" to "Perf report
for AIM7 fserver", as there is a 'perf trace' tool and that produces
different output, etc.

> 14.51%	reaim  		[kernel.kallsyms]	[k] osq_lock
> 4.98%	reaim  		reaim			[.] add_long
> 4.98%	reaim  		reaim			[.] add_int
> 4.31%	reaim  		[kernel.kallsyms]	[k] mutex_spin_on_owner
> ...
> 
> Perf trace of perf bench creat

Ditto and here will replace 'perf bench creat' with the new naming:
"perf bench locking vfs", right?

Yeah:

[acme@zoo linux]$ perf bench
Usage: 
	perf bench [<common options>] <collection> <benchmark>
[<options>]

        # List of all available benchmark collections:

         sched: Scheduler and IPC benchmarks
           mem: Memory access benchmarks
          numa: NUMA scheduling and MM benchmarks
         futex: Futex stressing benchmarks
       locking: Kernel locking benchmarks
           all: All benchmarks

[acme@zoo linux]$ perf bench locking

        # List of available benchmarks for collection 'locking':

           vfs: Benchmark vfs using creat(2)
           all: Run all benchmarks in this suite

[acme@zoo linux]$ perf bench locking vfs
# Running 'locking/vfs' benchmark:


> 22.37%	locking-creat  [kernel.kallsyms]	[k] osq_lock
> 5.77%	locking-creat  [kernel.kallsyms]	[k] mutex_spin_on_owner
> 5.31%	locking-creat  [kernel.kallsyms]	[k] _raw_spin_lock
> 5.15%	locking-creat  [jbd2]			[k] jbd2_journal_put_journal_head
> ...
> 
> Changes since v1:
> - Added -j options to specified jobs per processes.
> - Change name of microbenchmark from creat to vfs.
> - Change all instances of threads to proccess.
> 
> Signed-off-by: Tuan Bui <tuan.d.bui@...com>
> ---
>  tools/perf/Documentation/perf-bench.txt |   8 +
>  tools/perf/Makefile.perf                |   1 +
>  tools/perf/bench/bench.h                |   1 +
>  tools/perf/bench/locking.c              | 261 ++++++++++++++++++++++++++++++++
>  tools/perf/builtin-bench.c              |   8 +
>  5 files changed, 279 insertions(+)
>  create mode 100644 tools/perf/bench/locking.c
> 
> diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt
> index f6480cb..31144af 100644
> --- a/tools/perf/Documentation/perf-bench.txt
> +++ b/tools/perf/Documentation/perf-bench.txt
> @@ -58,6 +58,9 @@ SUBSYSTEM
>  'futex'::
>  	Futex stressing benchmarks.
>  
> +'locking'::
> +        Locking stressing benchmarks that produce similiar result as AIM7 fserver.
> +
>  'all'::
>  	All benchmark subsystems.
>  
> @@ -213,6 +216,11 @@ Suite for evaluating wake calls.
>  *requeue*::
>  Suite for evaluating requeue calls.
>  
> +SUITES FOR 'locking'
> +~~~~~~~~~~~~~~~~~~
> +*vfs*::
> +Suite for evaluating vfs locking contention through creat(2).
> +
>  SEE ALSO
>  --------
>  linkperf:perf[1]
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index 262916f..c8bee04 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -443,6 +443,7 @@ BUILTIN_OBJS += $(OUTPUT)bench/mem-memset.o
>  BUILTIN_OBJS += $(OUTPUT)bench/futex-hash.o
>  BUILTIN_OBJS += $(OUTPUT)bench/futex-wake.o
>  BUILTIN_OBJS += $(OUTPUT)bench/futex-requeue.o
> +BUILTIN_OBJS += $(OUTPUT)bench/locking.o
>  
>  BUILTIN_OBJS += $(OUTPUT)builtin-diff.o
>  BUILTIN_OBJS += $(OUTPUT)builtin-evlist.o
> diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
> index 3c4dd44..19468c5 100644
> --- a/tools/perf/bench/bench.h
> +++ b/tools/perf/bench/bench.h
> @@ -34,6 +34,7 @@ extern int bench_mem_memset(int argc, const char **argv, const char *prefix);
>  extern int bench_futex_hash(int argc, const char **argv, const char *prefix);
>  extern int bench_futex_wake(int argc, const char **argv, const char *prefix);
>  extern int bench_futex_requeue(int argc, const char **argv, const char *prefix);
> +extern int bench_locking_vfs(int argc, const char **argv, const char *prefix);
>  
>  #define BENCH_FORMAT_DEFAULT_STR	"default"
>  #define BENCH_FORMAT_DEFAULT		0
> diff --git a/tools/perf/bench/locking.c b/tools/perf/bench/locking.c
> new file mode 100644
> index 0000000..97cb07a
> --- /dev/null
> +++ b/tools/perf/bench/locking.c
> @@ -0,0 +1,261 @@
> +/*
> + * locking.c
> + *
> + * Simple micro benchmark that stress kernel locking contention with
> + * creat(2) system call by spawning multiple processes to call
> + * this system call.
> + *
> + * Results output are average operations/sec for all processes and
> + * average operations/sec per process.
> + *
> + * Tuan Bui <tuan.d.bui@...com>
> + */
> +
> +#include "../perf.h"
> +#include "../util/util.h"
> +#include "../util/stat.h"
> +#include "../util/parse-options.h"
> +#include "../util/header.h"
> +#include "bench.h"
> +
> +#include <err.h>
> +#include <stdlib.h>
> +#include <sys/time.h>
> +#include <unistd.h>
> +#include <sys/resource.h>
> +#include <linux/futex.h>
> +#include <sys/mman.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <signal.h>
> +#include <dirent.h>
> +
> +#define NOTSET -1
> +struct worker {
> +	pid_t pid;
> +	unsigned int order_id;
> +	char str[50];
> +};
> +
> +struct timeval start, end, total;
> +static unsigned int start_nr = 100;
> +static unsigned int end_nr = 1100;
> +static unsigned int increment_by = 100;
> +static int bench_dur = NOTSET;
> +static int num_jobs = NOTSET;
> +static bool run_jobs;
> +
> +/* Shared variables between fork processes*/
> +unsigned int *finished, *setup;
> +unsigned long long *shared_workers;
> +/* all processes will block on the same futex */
> +u_int32_t *futex;
> +
> +static const struct option options[] = {
> +	OPT_UINTEGER('s', "start", &start_nr, "Number of processes to start"),
> +	OPT_UINTEGER('e', "end", &end_nr, "Number of processes to end"),
> +	OPT_UINTEGER('i', "increment", &increment_by, "Numbers of processes to increment)"),
> +	OPT_INTEGER('r', "runtime", &bench_dur, "Specify benchmark runtime in seconds"),
> +	OPT_INTEGER('j', "jobs", &num_jobs, "Specify number of jobs per process"),
> +	OPT_END()
> +};
> +
> +static const char * const bench_locking_vfs_usage[] = {
> +	"perf bench locking vfs <options>",
> +	NULL
> +};
> +
> +/* Running bench vfs workload */
> +static void *run_bench_vfs(struct worker *workers)
> +{
> +	int fd;
> +	unsigned long long nr_ops = 0;
> +	int ret;
> +	int jobs = num_jobs;
> +
> +	sprintf(workers->str, "%d-XXXXXX", getpid());
> +	ret = mkstemp(workers->str);
> +	if (ret < 0)
> +		err(EXIT_FAILURE, "mkstemp");
> +
> +	/* Signal to parent process and wait till all processes/ are ready run */
> +	setup[workers->order_id] = 1;
> +	syscall(SYS_futex, futex, FUTEX_WAIT, 0, NULL, NULL, 0);
> +
> +	/* Start of the benchmark keep looping till parent process signal completion */
> +	while ((run_jobs ? jobs : (!*finished))) {
> +		fd = creat(workers->str, S_IRWXU);
> +		if (fd < 0)
> +			err(EXIT_FAILURE, "creat");
> +		nr_ops++;
> +		if (run_jobs)
> +			jobs--;
> +		close(fd);
> +	}
> +
> +	unlink(workers->str);
> +	shared_workers[workers->order_id] = nr_ops;
> +	setup[workers->order_id] = 0;
> +	exit(0);
> +}
> +
> +/* Setting shared variable finished and shared_workers */
> +static void setup_shared(void)
> +{
> +	unsigned int *finished_tmp, *setup_tmp;
> +	unsigned long long *shared_workers_tmp;
> +	u_int32_t *futex_tmp;
> +
> +	/* finished shared var is use to signal start and end of benchmark */
> +	finished_tmp = (void *)mmap(0, sizeof(unsigned int), PROT_READ|PROT_WRITE,
> +			MAP_SHARED|MAP_ANONYMOUS, -1, 0);
> +	if (finished_tmp == (void *) -1)
> +		err(EXIT_FAILURE, "mmap finished");
> +	finished = finished_tmp;
> +
> +	/* shared_workers is an array of ops perform by each process */
> +	shared_workers_tmp = (void *)mmap(0, sizeof(unsigned long long)*end_nr,
> +			PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
> +	if (shared_workers_tmp == (void *) -1)
> +		err(EXIT_FAILURE, "mmap shared_workers");
> +	shared_workers = shared_workers_tmp;
> +
> +	/* setup is use for each processes to signal that it is done
> +	 * setting up for the benchmark and is ready to run */
> +	setup_tmp = (void *)mmap(0, sizeof(unsigned int)*end_nr,
> +			PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0);
> +	if (setup_tmp == (void *) -1)
> +		err(EXIT_FAILURE, "mmap shared_workers");
> +	setup = setup_tmp;
> +
> +	/* Processes will sleep on this futex until all other processes
> +	 * are done setting up and are ready to run */
> +	futex_tmp = (void *)mmap(0, sizeof(u_int32_t *), PROT_READ|PROT_WRITE,
> +			MAP_SHARED|MAP_ANONYMOUS, -1, 0);
> +	if (futex_tmp == (void *) -1)
> +		err(EXIT_FAILURE, "mmap finished");
> +	futex = futex_tmp;
> +	(*futex) = 0;
> +}
> +
> +/* Freeing shared variables */
> +static void free_resources(void)
> +{
> +	if ((munmap(finished, sizeof(unsigned int)) == -1))
> +		err(EXIT_FAILURE, "munmap finished");
> +
> +	if ((munmap(shared_workers, sizeof(unsigned long long) * end_nr) == -1))
> +		err(EXIT_FAILURE, "munmap shared_workers");
> +
> +	if ((munmap(setup, sizeof(unsigned int) * end_nr) == -1))
> +		err(EXIT_FAILURE, "munmap shared_workers");
> +
> +	if ((munmap(futex, sizeof(u_int32_t))) == -1)
> +		err(EXIT_FAILURE, "munmap finished");
> +}
> +
> +/* Start to spawn workers and wait till all workers have been
> + * created before starting workload */
> +static void spawn_workers(void *(*bench_ptr) (struct worker *))
> +{
> +	pid_t parent, child;
> +	unsigned int i, j, k;
> +	struct worker workers;
> +	unsigned long long total_ops;
> +	unsigned int total_workers;
> +
> +	parent = getpid();
> +	setup_shared();
> +
> +	/* This loop through all the run each is increment by increment_by */
> +	for (i = start_nr; i <= end_nr; i += increment_by) {
> +
> +		for (j = 0; j < i; j++) {
> +			if (!fork())
> +				break;
> +		}
> +
> +		child = getpid();
> +		/* Initialize child worker struct and run benchmark */
> +		if (child != parent) {
> +			workers.order_id = j;
> +			workers.pid = child;
> +			bench_ptr(&workers);
> +		}
> +		/* Parent to sleep during the duration of benchmark */
> +		else{
> +			/* Make sure all child process are created and setup
> +			 * before starting benchmark for bench_dur durations */
> +			do {
> +				total_workers = 0;
> +				for (k = 0; k < i; k++)
> +					total_workers = total_workers + setup[k];
> +			} while (total_workers != i);
> +
> +			/* Wake up all sleeping process to run the benchmark */
> +			(*futex) = 1;
> +			syscall(SYS_futex, futex, FUTEX_WAKE, i, NULL, NULL, 0);
> +
> +			/* If run time parameters is set */
> +			if (!run_jobs) {
> +				/* All proccesses are ready signal them to run */
> +				gettimeofday(&start, NULL);
> +				sleep(bench_dur);
> +				(*finished) = 1;
> +				gettimeofday(&end, NULL);
> +				timersub(&end, &start, &total);
> +
> +				for (k = 0; k < i; k++)
> +					wait(NULL);
> +			}
> +			/* If jobs per proccesses is set */
> +			else {
> +				/* All proccesses are ready signal them to run */
> +				gettimeofday(&start, NULL);
> +				/* Wait for all process to terminate before getting outputs */
> +				for (k = 0; k < i; k++)
> +					wait(NULL);
> +				gettimeofday(&end, NULL);
> +				timersub(&end, &start, &total);
> +			}
> +
> +			/* Sum up all the ops by each process and report */
> +			total_ops = 0;
> +			for (k = 0; k < i; k++)
> +				total_ops = total_ops + shared_workers[k];
> +
> +			printf("\n%6d processes: throughput = %llu average opts/sec all processes\n",
> +				i, (total_ops / (!total.tv_sec ? 1 : total.tv_sec)));
> +
> +			printf("%6d processes: throughput = %llu average opts/sec per process\n",
> +				i, ((total_ops/(!total.tv_sec ? 1 : total.tv_sec))/(!i ? 1 : i)));
> +
> +			/* Reset back to 0 for next run */
> +			(*finished) = 0;
> +			(*futex) = 0;
> +		}
> +	}
> +}
> +
> +int bench_locking_vfs(int argc, const char **argv,
> +			const char *prefix __maybe_unused)
> +{
> +	argc = parse_options(argc, argv, options, bench_locking_vfs_usage, 0);
> +
> +	/* If errors parsing options or if both run time and job options is set */
> +	if (argc || ((bench_dur != NOTSET) && (num_jobs != NOTSET))) {
> +		fprintf(stderr, "\n runtime and jobs options can not both be specified\n");
> +		usage_with_options(bench_locking_vfs_usage, options);
> +		exit(EXIT_FAILURE);
> +	}
> +	/* If both run time and jobs options is not set default to run time only*/
> +	if ((bench_dur == NOTSET) && (num_jobs == NOTSET))
> +		bench_dur = 5;
> +
> +	if (num_jobs != NOTSET)
> +		run_jobs = true;
> +
> +	spawn_workers(run_bench_vfs);
> +	free_resources();
> +	return 0;
> +}
> diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
> index b9a56fa..fdfb089 100644
> --- a/tools/perf/builtin-bench.c
> +++ b/tools/perf/builtin-bench.c
> @@ -63,6 +63,13 @@ static struct bench futex_benchmarks[] = {
>  	{ NULL,		NULL,						NULL			}
>  };
>  
> +static struct bench locking_benchmarks[] = {
> +	{ "vfs",	"Benchmark vfs using creat(2)",			bench_locking_vfs	},
> +	{ "all",        "Run all benchmarks in this suite",		NULL			},
> +	{ NULL,		NULL,						NULL			}
> +};
> +
> +
>  struct collection {
>  	const char	*name;
>  	const char	*summary;
> @@ -76,6 +83,7 @@ static struct collection collections[] = {
>  	{ "numa",	"NUMA scheduling and MM benchmarks",		numa_benchmarks		},
>  #endif
>  	{"futex",       "Futex stressing benchmarks",                   futex_benchmarks        },
> +	{"locking",     "Kernel locking benchmarks",                    locking_benchmarks      },
>  	{ "all",	"All benchmarks",				NULL			},
>  	{ NULL,		NULL,						NULL			}
>  };
> -- 
> 1.9.1
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/