lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200526155258.GB23187@redhat.com>
Date:   Tue, 26 May 2020 12:52:58 -0300
From:   Arnaldo Carvalho de Melo <acme@...hat.com>
To:     Matt Fleming <matt@...eblueprint.co.uk>
Cc:     Jiri Olsa <jolsa@...nel.org>, linux-kernel@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH v2] perf ordered_events: Optimise event object reuse

Adding a few more folks that worked on the ordering of events over the
years.

Some minor nits at the end of the message.

Thanks!

- Arnaldo

Em Tue, May 26, 2020 at 02:59:28PM +0100, Matt Fleming escreveu:
> ordered_event objects can be placed on the free event list in any order
> which means future allocations may not return objects at sequential
> locations in memory. Getting non-contiguous objects from the free list
> has bad consequences when later iterating over those objects in
> ordered_events__queue().
> 
> For example, large perf.data files can contain trillions of events and
> since objects that are next to each other in the free linked-list can
> point to pretty much anywhere in the object address space, lots of
> cycles in ordered_events__queue() are spent servicing DTLB misses.
> 
> Implement the free event cache using the in-kernel implementation of
> interval trees so that objects can always be allocated from the free
> object cache in sequential order, improving spatial locality and
> reducing DTLB misses.
> 
> Since the existing linked-list implementation is faster if you're
> working with fewer events, add a --free-event-list option to force the
> use of the linked-list implementation. However, most users will benefit
> from the new interval tree code.
> 
> Here are some numbers showing the speed up (reduction in execution time)
> when running perf sched latency on sched events data and perf report on
> HW_CPU_CYCLES.
> 
>  $ perf stat --null -r 10 -- bash -c \
> 	"export PAGER=cat ; perf sched latency -i $file --stdio &>/dev/null"
> 
>   Nr events     File Size   Before    After    Speed up
> --------------  ---------  --------  -------  ----------
>   123318457470     29MB     0.2149    0.2440    -13.5%
>  1651500885357    260MB     3.1205    1.9855    +36.4%
>  3470519751785    506MB     6.1821    3.8941    +37.0%
>  8213765551679   1100MB    13.4875    8.5079    +36.9%
> 15900515973759   1946MB    23.4069   15.0960    +35.5%
> 
> and HW_CPU_CYCLES events:
> 
>  $ perf stat --null -r 10 -- bash -c \
> 	"export PAGER=cat ; perf report -i $file --stdio &>/dev/null"
> 
>   Nr events     File Size   Before    After    Speed up
> --------------  ---------  --------  -------  ----------
>   328805166262     29MB      1.637     1.587    +3.0%
>  3280413919096    253MB     13.381    12.349    +7.7%
>  6475305954370    500MB     25.648    23.753    +7.4%
> 14218430569416   1000MB     52.800    49.036    +7.1%
> 26760562279427   2000MB     97.169    90.129    +7.2%
> 
> Signed-off-by: Matt Fleming <matt@...eblueprint.co.uk>
> ---
> 
> Changes in v2:
> 
>  - Add --free-event-list parameter to fallback to the linked-list
>    implementation of the free event cache.
> 
>  - Rename the free_cache_* API to free_event_* now that we've both both
>    the old linked-list version and the newer _tree() calls. Also make
>    the API public so I can drop the #define gunk in the tests.
> 
>  - Update check-header.sh for interval tree files.
> 
>  - Remove leftover struct cache_region type that accidentally snuck into v1.
> 
>  tools/include/linux/interval_tree.h         |  30 +++
>  tools/include/linux/interval_tree_generic.h | 187 ++++++++++++++++++
>  tools/lib/interval_tree.c                   |  12 ++
>  tools/perf/MANIFEST                         |   1 +
>  tools/perf/check-headers.sh                 |   2 +
>  tools/perf/perf.c                           |   6 +
>  tools/perf/tests/Build                      |   1 +
>  tools/perf/tests/builtin-test.c             |   4 +
>  tools/perf/tests/free-event-tree.c          | 201 ++++++++++++++++++++
>  tools/perf/tests/tests.h                    |   2 +
>  tools/perf/util/Build                       |   6 +
>  tools/perf/util/ordered-events.c            | 174 ++++++++++++++++-
>  tools/perf/util/ordered-events.h            |  16 +-
>  13 files changed, 634 insertions(+), 8 deletions(-)
>  create mode 100644 tools/include/linux/interval_tree.h
>  create mode 100644 tools/include/linux/interval_tree_generic.h
>  create mode 100644 tools/lib/interval_tree.c
>  create mode 100644 tools/perf/tests/free-event-tree.c
> 
> diff --git a/tools/include/linux/interval_tree.h b/tools/include/linux/interval_tree.h
> new file mode 100644
> index 000000000000..288c26f50732
> --- /dev/null
> +++ b/tools/include/linux/interval_tree.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_INTERVAL_TREE_H
> +#define _LINUX_INTERVAL_TREE_H
> +
> +#include <linux/rbtree.h>
> +
> +struct interval_tree_node {
> +	struct rb_node rb;
> +	unsigned long start;	/* Start of interval */
> +	unsigned long last;	/* Last location _in_ interval */
> +	unsigned long __subtree_last;
> +};
> +
> +extern void
> +interval_tree_insert(struct interval_tree_node *node,
> +		     struct rb_root_cached *root);
> +
> +extern void
> +interval_tree_remove(struct interval_tree_node *node,
> +		     struct rb_root_cached *root);
> +
> +extern struct interval_tree_node *
> +interval_tree_iter_first(struct rb_root_cached *root,
> +			 unsigned long start, unsigned long last);
> +
> +extern struct interval_tree_node *
> +interval_tree_iter_next(struct interval_tree_node *node,
> +			unsigned long start, unsigned long last);
> +
> +#endif	/* _LINUX_INTERVAL_TREE_H */
> diff --git a/tools/include/linux/interval_tree_generic.h b/tools/include/linux/interval_tree_generic.h
> new file mode 100644
> index 000000000000..aaa8a0767aa3
> --- /dev/null
> +++ b/tools/include/linux/interval_tree_generic.h
> @@ -0,0 +1,187 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> +  Interval Trees
> +  (C) 2012  Michel Lespinasse <walken@...gle.com>
> +
> +
> +  include/linux/interval_tree_generic.h
> +*/
> +
> +#include <linux/rbtree_augmented.h>
> +
> +/*
> + * Template for implementing interval trees
> + *
> + * ITSTRUCT:   struct type of the interval tree nodes
> + * ITRB:       name of struct rb_node field within ITSTRUCT
> + * ITTYPE:     type of the interval endpoints
> + * ITSUBTREE:  name of ITTYPE field within ITSTRUCT holding last-in-subtree
> + * ITSTART(n): start endpoint of ITSTRUCT node n
> + * ITLAST(n):  last endpoint of ITSTRUCT node n
> + * ITSTATIC:   'static' or empty
> + * ITPREFIX:   prefix to use for the inline tree definitions
> + *
> + * Note - before using this, please consider if generic version
> + * (interval_tree.h) would work for you...
> + */
> +
> +#define INTERVAL_TREE_DEFINE(ITSTRUCT, ITRB, ITTYPE, ITSUBTREE,		      \
> +			     ITSTART, ITLAST, ITSTATIC, ITPREFIX)	      \
> +									      \
> +/* Callbacks for augmented rbtree insert and remove */			      \
> +									      \
> +RB_DECLARE_CALLBACKS_MAX(static, ITPREFIX ## _augment,			      \
> +			 ITSTRUCT, ITRB, ITTYPE, ITSUBTREE, ITLAST)	      \
> +									      \
> +/* Insert / remove interval nodes from the tree */			      \
> +									      \
> +ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node,			      \
> +				  struct rb_root_cached *root)	 	      \
> +{									      \
> +	struct rb_node **link = &root->rb_root.rb_node, *rb_parent = NULL;    \
> +	ITTYPE start = ITSTART(node), last = ITLAST(node);		      \
> +	ITSTRUCT *parent;						      \
> +	bool leftmost = true;						      \
> +									      \
> +	while (*link) {							      \
> +		rb_parent = *link;					      \
> +		parent = rb_entry(rb_parent, ITSTRUCT, ITRB);		      \
> +		if (parent->ITSUBTREE < last)				      \
> +			parent->ITSUBTREE = last;			      \
> +		if (start < ITSTART(parent))				      \
> +			link = &parent->ITRB.rb_left;			      \
> +		else {							      \
> +			link = &parent->ITRB.rb_right;			      \
> +			leftmost = false;				      \
> +		}							      \
> +	}								      \
> +									      \
> +	node->ITSUBTREE = last;						      \
> +	rb_link_node(&node->ITRB, rb_parent, link);			      \
> +	rb_insert_augmented_cached(&node->ITRB, root,			      \
> +				   leftmost, &ITPREFIX ## _augment);	      \
> +}									      \
> +									      \
> +ITSTATIC void ITPREFIX ## _remove(ITSTRUCT *node,			      \
> +				  struct rb_root_cached *root)		      \
> +{									      \
> +	rb_erase_augmented_cached(&node->ITRB, root, &ITPREFIX ## _augment);  \
> +}									      \
> +									      \
> +/*									      \
> + * Iterate over intervals intersecting [start;last]			      \
> + *									      \
> + * Note that a node's interval intersects [start;last] iff:		      \
> + *   Cond1: ITSTART(node) <= last					      \
> + * and									      \
> + *   Cond2: start <= ITLAST(node)					      \
> + */									      \
> +									      \
> +static ITSTRUCT *							      \
> +ITPREFIX ## _subtree_search(ITSTRUCT *node, ITTYPE start, ITTYPE last)	      \
> +{									      \
> +	while (true) {							      \
> +		/*							      \
> +		 * Loop invariant: start <= node->ITSUBTREE		      \
> +		 * (Cond2 is satisfied by one of the subtree nodes)	      \
> +		 */							      \
> +		if (node->ITRB.rb_left) {				      \
> +			ITSTRUCT *left = rb_entry(node->ITRB.rb_left,	      \
> +						  ITSTRUCT, ITRB);	      \
> +			if (start <= left->ITSUBTREE) {			      \
> +				/*					      \
> +				 * Some nodes in left subtree satisfy Cond2.  \
> +				 * Iterate to find the leftmost such node N.  \
> +				 * If it also satisfies Cond1, that's the     \
> +				 * match we are looking for. Otherwise, there \
> +				 * is no matching interval as nodes to the    \
> +				 * right of N can't satisfy Cond1 either.     \
> +				 */					      \
> +				node = left;				      \
> +				continue;				      \
> +			}						      \
> +		}							      \
> +		if (ITSTART(node) <= last) {		/* Cond1 */	      \
> +			if (start <= ITLAST(node))	/* Cond2 */	      \
> +				return node;	/* node is leftmost match */  \
> +			if (node->ITRB.rb_right) {			      \
> +				node = rb_entry(node->ITRB.rb_right,	      \
> +						ITSTRUCT, ITRB);	      \
> +				if (start <= node->ITSUBTREE)		      \
> +					continue;			      \
> +			}						      \
> +		}							      \
> +		return NULL;	/* No match */				      \
> +	}								      \
> +}									      \
> +									      \
> +ITSTATIC ITSTRUCT *							      \
> +ITPREFIX ## _iter_first(struct rb_root_cached *root,			      \
> +			ITTYPE start, ITTYPE last)			      \
> +{									      \
> +	ITSTRUCT *node, *leftmost;					      \
> +									      \
> +	if (!root->rb_root.rb_node)					      \
> +		return NULL;						      \
> +									      \
> +	/*								      \
> +	 * Fastpath range intersection/overlap between A: [a0, a1] and	      \
> +	 * B: [b0, b1] is given by:					      \
> +	 *								      \
> +	 *         a0 <= b1 && b0 <= a1					      \
> +	 *								      \
> +	 *  ... where A holds the lock range and B holds the smallest	      \
> +	 * 'start' and largest 'last' in the tree. For the later, we	      \
> +	 * rely on the root node, which by augmented interval tree	      \
> +	 * property, holds the largest value in its last-in-subtree.	      \
> +	 * This allows mitigating some of the tree walk overhead for	      \
> +	 * for non-intersecting ranges, maintained and consulted in O(1).     \
> +	 */								      \
> +	node = rb_entry(root->rb_root.rb_node, ITSTRUCT, ITRB);		      \
> +	if (node->ITSUBTREE < start)					      \
> +		return NULL;						      \
> +									      \
> +	leftmost = rb_entry(root->rb_leftmost, ITSTRUCT, ITRB);		      \
> +	if (ITSTART(leftmost) > last)					      \
> +		return NULL;						      \
> +									      \
> +	return ITPREFIX ## _subtree_search(node, start, last);		      \
> +}									      \
> +									      \
> +ITSTATIC ITSTRUCT *							      \
> +ITPREFIX ## _iter_next(ITSTRUCT *node, ITTYPE start, ITTYPE last)	      \
> +{									      \
> +	struct rb_node *rb = node->ITRB.rb_right, *prev;		      \
> +									      \
> +	while (true) {							      \
> +		/*							      \
> +		 * Loop invariants:					      \
> +		 *   Cond1: ITSTART(node) <= last			      \
> +		 *   rb == node->ITRB.rb_right				      \
> +		 *							      \
> +		 * First, search right subtree if suitable		      \
> +		 */							      \
> +		if (rb) {						      \
> +			ITSTRUCT *right = rb_entry(rb, ITSTRUCT, ITRB);	      \
> +			if (start <= right->ITSUBTREE)			      \
> +				return ITPREFIX ## _subtree_search(right,     \
> +								start, last); \
> +		}							      \
> +									      \
> +		/* Move up the tree until we come from a node's left child */ \
> +		do {							      \
> +			rb = rb_parent(&node->ITRB);			      \
> +			if (!rb)					      \
> +				return NULL;				      \
> +			prev = &node->ITRB;				      \
> +			node = rb_entry(rb, ITSTRUCT, ITRB);		      \
> +			rb = node->ITRB.rb_right;			      \
> +		} while (prev == rb);					      \
> +									      \
> +		/* Check if the node intersects [start;last] */		      \
> +		if (last < ITSTART(node))		/* !Cond1 */	      \
> +			return NULL;					      \
> +		else if (start <= ITLAST(node))		/* Cond2 */	      \
> +			return node;					      \
> +	}								      \
> +}
> diff --git a/tools/lib/interval_tree.c b/tools/lib/interval_tree.c
> new file mode 100644
> index 000000000000..4775c291edd6
> --- /dev/null
> +++ b/tools/lib/interval_tree.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/interval_tree.h>
> +#include <linux/interval_tree_generic.h>
> +#include <linux/compiler.h>
> +#include <linux/export.h>
> +
> +#define START(node) ((node)->start)
> +#define LAST(node)  ((node)->last)
> +
> +INTERVAL_TREE_DEFINE(struct interval_tree_node, rb,
> +		     unsigned long, __subtree_last,
> +		     START, LAST,, interval_tree)
> diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
> index 5d7b947320fb..01fe06660a77 100644
> --- a/tools/perf/MANIFEST
> +++ b/tools/perf/MANIFEST
> @@ -20,4 +20,5 @@ tools/lib/bitmap.c
>  tools/lib/str_error_r.c
>  tools/lib/vsprintf.c
>  tools/lib/zalloc.c
> +tools/lib/interval_tree.c
>  scripts/bpf_helpers_doc.py
> diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh
> index cf147db4e5ca..621709897ae9 100755
> --- a/tools/perf/check-headers.sh
> +++ b/tools/perf/check-headers.sh
> @@ -73,6 +73,8 @@ include/uapi/asm-generic/errno-base.h
>  include/uapi/asm-generic/ioctls.h
>  include/uapi/asm-generic/mman-common.h
>  include/uapi/asm-generic/unistd.h
> +include/linux/interval_tree.h
> +include/linux/interval_tree_generic.h
>  '
>  
>  check_2 () {
> diff --git a/tools/perf/perf.c b/tools/perf/perf.c
> index 27f94b0bb874..ca5d51730ae1 100644
> --- a/tools/perf/perf.c
> +++ b/tools/perf/perf.c
> @@ -21,6 +21,7 @@
>  #include "util/bpf-loader.h"
>  #include "util/debug.h"
>  #include "util/event.h"
> +#include "util/ordered-events.h"
>  #include "util/util.h" // usage()
>  #include "ui/ui.h"
>  #include "perf-sys.h"
> @@ -164,6 +165,7 @@ struct option options[] = {
>  	OPT_ARGUMENT("list-cmds", "list-cmds"),
>  	OPT_ARGUMENT("list-opts", "list-opts"),
>  	OPT_ARGUMENT("debug", "debug"),
> +	OPT_ARGUMENT("free-event-list", "free-event-list"),
>  	OPT_END()
>  };
>  
> @@ -277,6 +279,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>  
>  			(*argv)++;
>  			(*argc)--;
> +		} else if (!strcmp(cmd, "--free-event-list")) {
> +			free_event_list = 1;
> +			if (envchanged)
> +				*envchanged = 1;
>  		} else {
>  			fprintf(stderr, "Unknown option: %s\n", cmd);
>  			usage(perf_usage_string);
> diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
> index b3d1bf13ca07..fc83238c9101 100644
> --- a/tools/perf/tests/Build
> +++ b/tools/perf/tests/Build
> @@ -56,6 +56,7 @@ perf-y += mem2node.o
>  perf-y += maps.o
>  perf-y += time-utils-test.o
>  perf-y += genelf.o
> +perf-y += free-event-tree.o
>  
>  $(OUTPUT)tests/llvm-src-base.c: tests/bpf-script-example.c tests/Build
>  	$(call rule_mkdir)
> diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
> index b6322eb0f423..68f4728db594 100644
> --- a/tools/perf/tests/builtin-test.c
> +++ b/tools/perf/tests/builtin-test.c
> @@ -313,6 +313,10 @@ static struct test generic_tests[] = {
>  		.desc = "maps__merge_in",
>  		.func = test__maps__merge_in,
>  	},
> +	{
> +		.desc = "Free event interval tree",
> +		.func = test__free_event_tree,
> +	},
>  	{
>  		.func = NULL,
>  	},
> diff --git a/tools/perf/tests/free-event-tree.c b/tools/perf/tests/free-event-tree.c
> new file mode 100644
> index 000000000000..fd2fe98ccf81
> --- /dev/null
> +++ b/tools/perf/tests/free-event-tree.c
> @@ -0,0 +1,201 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <stdlib.h>
> +#include <string.h>
> +#include <linux/bitmap.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include "tests.h"
> +#include "ordered-events.h"
> +#include "debug.h"
> +
> +static void *alloc_cache(unsigned int num_objs)
> +{
> +	void *ptr = calloc(sizeof(struct ordered_event), num_objs);
> +	return ptr;
> +}
> +
> +static inline struct ordered_event *
> +cache_obj_entry(void *ptr, unsigned int index)
> +{
> +	struct ordered_event *objs = ptr;
> +	return &objs[index];
> +}
> +
> +static int test_cache_get_put(void)
> +{
> +	struct ordered_events oe;
> +	struct ordered_event *e;
> +	int num_objs, i;
> +	void *ptr;
> +
> +	ordered_events__init(&oe, NULL, NULL);
> +
> +	num_objs = 8;
> +	ptr = alloc_cache(num_objs);
> +
> +	e = free_event_get(&oe);
> +	TEST_ASSERT_VAL("got object from empty cache", e == NULL);
> +
> +	for (i = 0; i < num_objs; i++) {
> +		e = cache_obj_entry(ptr, i);
> +		list_add(&e->list, &oe.events);
> +		free_event_put(&oe, e);
> +	}
> +
> +	for (i = 0; i < num_objs; i++) {
> +		e = free_event_get(&oe);
> +		TEST_ASSERT_VAL("ran out of objects", e != NULL);
> +	}
> +
> +	e = free_event_get(&oe);
> +	TEST_ASSERT_VAL("got object from empty cache after put", e == NULL);
> +
> +	free(ptr);
> +
> +	return 0;
> +}
> +
> +#define CHECK_EVENTS_IN_ORDER(o, addr) \
> +	for (i = 0; i < num_objs; i++) { \
> +		struct ordered_event *__e = free_event_get(o); \
> +		TEST_ASSERT_VAL("out-of-order event object", __e ==  cache_obj_entry(addr, i)); \
> +	}
> +
> +/*
> + * The reason that the free event cache switched to using interval
> + * trees is that accessing event objects in-order (with contiguous
> + * addresses) is much faster than accessing them out-of-order when
> + * working with a large numbers of events.
> + *
> + * Check that event objects are always allocated in-order no matter which
> + * ordered they're added to the cache.
> + */
> +static int test_cache_get_put_in_order(void)
> +{
> +	struct ordered_events oe;
> +	struct ordered_event *e;
> +	unsigned long *obj_map;
> +	int num_objs, i;
> +	void *ptr;
> +
> +	ordered_events__init(&oe, NULL, NULL);
> +
> +	num_objs = 8192;
> +	ptr = alloc_cache(num_objs);
> +
> +	for (i = 0; i < num_objs; i++) {
> +		e = cache_obj_entry(ptr, i);
> +		list_add(&e->list, &oe.events);
> +		free_event_put(&oe, e);
> +	}
> +	CHECK_EVENTS_IN_ORDER(&oe, ptr);
> +
> +	/*
> +	 * Reverse the put order and check we still see ascending
> +	 * addresses on get.
> +	 */
> +	for (i = num_objs-1; i >= 0; i--) {
> +		e = cache_obj_entry(ptr, i);
> +		list_add(&e->list, &oe.events);
> +		free_event_put(&oe, e);
> +	}
> +	CHECK_EVENTS_IN_ORDER(&oe, ptr);
> +
> +	/*
> +	 * Insert objects randomly and check we still see ascending
> +	 * addresses on get.
> +	 */
> +	obj_map = bitmap_alloc(num_objs);
> +	srand(0);
> +
> +	while (!bitmap_full(obj_map, num_objs)) {
> +		i = rand() % num_objs;
> +
> +		/* Not already inserted? */
> +		if (!test_and_set_bit(i, obj_map)) {
> +			e = cache_obj_entry(ptr, i);
> +			list_add(&e->list, &oe.events);
> +			free_event_put(&oe, e);
> +		}
> +	}
> +	CHECK_EVENTS_IN_ORDER(&oe, ptr);
> +
> +	bitmap_free(obj_map);
> +	free(ptr);
> +
> +	return 0;
> +}
> +
> +/*
> + * Punch holes in the object address space and make sure that we can
> + * later fill them without corrupting objects.
> + */
> +static int test_cache_hole_punch(void)
> +{
> +	struct ordered_events oe;
> +	struct ordered_event *e;
> +	int num_objs, i;
> +	int holes[3] = { 2046, 4097, 7084 };
> +	void *ptr;
> +
> +	ordered_events__init(&oe, NULL, NULL);
> +
> +	num_objs = 8192;
> +	ptr = alloc_cache(num_objs);
> +
> +	for (i = 0; i < num_objs; i++) {
> +		switch (i) {
> +		case 2046:
> +		case 4097:
> +		case 7084:
> +			continue;
> +		default:
> +			break;
> +		}
> +
> +		e = cache_obj_entry(ptr, i);
> +		e->timestamp = i;
> +		list_add(&e->list, &oe.events);
> +		free_event_put(&oe, e);
> +	}
> +
> +	for (i = 0; i < 3; i++) {
> +		e = cache_obj_entry(ptr, holes[i]);
> +		e->timestamp = holes[i];
> +		list_add(&e->list, &oe.events);
> +		free_event_put(&oe, e);
> +	}
> +
> +	for (i = 0; i < num_objs; i++) {
> +		e = free_event_get(&oe);
> +
> +		TEST_ASSERT_VAL("out-of-order event object", e ==  cache_obj_entry(ptr, i));
> +		TEST_ASSERT_VAL("corrupt event object", e->timestamp == (u64)i);
> +	}
> +
> +	free(ptr);
> +
> +	return 0;
> +}
> +
> +int test__free_event_tree(struct test *test __maybe_unused, int subtest __maybe_unused)
> +{
> +	int ret;
> +
> +	if (free_event_list)
> +		return TEST_SKIP;
> +
> +	ret = test_cache_get_put();
> +	if (ret)
> +		return ret;
> +
> +	ret = test_cache_get_put_in_order();
> +	if (ret)
> +		return ret;
> +
> +	ret = test_cache_hole_punch();
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
> index 61a1ab032080..fcc565405376 100644
> --- a/tools/perf/tests/tests.h
> +++ b/tools/perf/tests/tests.h
> @@ -117,6 +117,8 @@ bool test__bp_signal_is_supported(void);
>  bool test__bp_account_is_supported(void);
>  bool test__wp_is_supported(void);
>  
> +int test__free_event_tree(struct test *test, int subtest);
> +
>  #if defined(__arm__) || defined(__aarch64__)
>  #ifdef HAVE_DWARF_UNWIND_SUPPORT
>  struct thread;
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index c0cf8dff694e..3bfb4ab1bd7f 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -28,6 +28,7 @@ perf-y += print_binary.o
>  perf-y += rlimit.o
>  perf-y += argv_split.o
>  perf-y += rbtree.o
> +perf-y += interval_tree.o
>  perf-y += libstring.o
>  perf-y += bitmap.o
>  perf-y += hweight.o
> @@ -223,6 +224,7 @@ CFLAGS_find_bit.o      += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ET
>  CFLAGS_rbtree.o        += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
>  CFLAGS_libstring.o     += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
>  CFLAGS_hweight.o       += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
> +CFLAGS_interval_tree.o += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
>  CFLAGS_parse-events.o  += -Wno-redundant-decls
>  CFLAGS_expr.o          += -Wno-redundant-decls
>  CFLAGS_header.o        += -include $(OUTPUT)PERF-VERSION-FILE
> @@ -251,6 +253,10 @@ $(OUTPUT)util/rbtree.o: ../lib/rbtree.c FORCE
>  	$(call rule_mkdir)
>  	$(call if_changed_dep,cc_o_c)
>  
> +$(OUTPUT)util/interval_tree.o: ../lib/interval_tree.c FORCE
> +	$(call rule_mkdir)
> +	$(call if_changed_dep,cc_o_c)
> +
>  $(OUTPUT)util/libstring.o: ../lib/string.c FORCE
>  	$(call rule_mkdir)
>  	$(call if_changed_dep,cc_o_c)
> diff --git a/tools/perf/util/ordered-events.c b/tools/perf/util/ordered-events.c
> index 359db2b1fcef..775b40129448 100644
> --- a/tools/perf/util/ordered-events.c
> +++ b/tools/perf/util/ordered-events.c
> @@ -4,6 +4,7 @@
>  #include <linux/list.h>
>  #include <linux/compiler.h>
>  #include <linux/string.h>
> +#include <linux/interval_tree.h>
>  #include "ordered-events.h"
>  #include "session.h"
>  #include "asm/bug.h"
> @@ -95,11 +96,160 @@ static void free_dup_event(struct ordered_events *oe, union perf_event *event)
>  		__free_dup_event(oe, event);
>  }
>  
> +/*
> + * Using interval trees for the free object cache gives better
> + * performance as the number of events grows, but the --free-event-list
> + * command-line option exists to force the use of a linked-list for
> + * those users where the overhead of maintaining interval trees is too
> + * high, e.g. when you've only got a small number of events.
> + */
> +bool free_event_list = false;
> +
> +static inline unsigned long cache_region_size(struct interval_tree_node *it)
> +{
> +	return it->last - it->start + 1;
> +}
> +
> +/*
> + * Allocate a new event object from the free event cache.
> + *
> + * Find the first address range in the cache and carve out enough bytes
> + * for an ordered_event objects. The object with the lowest address is
> + * always returned so that subsequent allocations benefit from
> + * contiguous memory accesses (spatial locality).
> + */
> +static struct ordered_event *free_event_get_tree(struct ordered_events *oe)
> +{
> +	struct interval_tree_node *it;
> +	struct ordered_event *new;
> +	size_t bytes = sizeof(*new);
> +
> +	it = interval_tree_iter_first(&oe->cache.rb, 0, ULONG_MAX);
> +	if (!it)
> +		return NULL;
> +
> +	/* Has the cache memory been exhausted? */
> +	assert(cache_region_size(it) >= bytes);
> +
> +	new = (void *)it->start;
> +
> +	if (cache_region_size(it) == bytes) {
> +		interval_tree_remove(it, &oe->cache.rb);
> +		free(it);
> +	}
> +
> +	it->start += bytes;
> +	return new;
> +}
> +
> +struct ordered_event *free_event_get(struct ordered_events *oe)
> +{
> +	struct list_head *list = &oe->cache.list;
> +
> +	if (!free_event_list)
> +		return free_event_get_tree(oe);
> +
> +	if (!list_empty(list)) {
> +		struct ordered_event *new;
> +		new = list_entry(list->next, struct ordered_event, list);
> +		list_del_init(&new->list);
> +		return new;
> +	}
> +
> +	return NULL;
> +}
> +
> +static int free_event_put_tree(struct ordered_events *oe, struct ordered_event *event)
> +{
> +	struct interval_tree_node *it;
> +	unsigned long start, end;
> +
> +	list_del_init(&event->list);
> +
> +	/*
> +	 * We're inserting a new region of free memory. Check to see if
> +	 * this region can be merged with an existing one. The three cases
> +	 * to consider are:
> +	 *
> +	 *     +----------+-------+
> +	 * 1)  | it->last | event |
> +	 *     +----------+-------+
> +	 *
> +	 *                +-------+-----------+
> +	 * 2)             | event | it->start |
> +	 *                +-------+-----------+
> +	 *
> +	 *     +----------+-------+-------------+
> +	 * 3)  | it->last | event | next->start |
> +	 *     +----------+-------+-------------+
> +	 *
> +	 * Add a byte to either side of 'event' to see if there's an
> +	 * existing free region(s) to merge with.
> +	 */
> +	start = (unsigned long)event;
> +	end = (unsigned long)event + sizeof(*event) - 1;
> +
> +	it = interval_tree_iter_first(&oe->cache.rb, start - 1, end + 1);
> +	if (it) {
> +		struct interval_tree_node *next;
> +
> +		next = interval_tree_iter_next(it, start - 1, end);
> +		interval_tree_remove(it, &oe->cache.rb);
> +
> +		if (next) {
> +			/* Case 3: Merge two regions */
> +			assert((next->start - (it->last + 1)) == sizeof(*event));
> +
> +			interval_tree_remove(next, &oe->cache.rb);
> +
> +			it->start = start;
> +			it->last = next->last;
> +			free(next);
> +		} else if (it->last < start) {
> +			/* Case 1: Extend end of region */
> +			it->last = end;
> +		} else if (it->start > end) {
> +			/* Case 2: Extend start of region */
> +			it->start = start;
> +		}
> +
> +		interval_tree_insert(it, &oe->cache.rb);
> +		return 0;
> +	}
> +
> +	it = malloc(sizeof(*it));
> +	if (!it)
> +		return -ENOMEM;
> +
> +	it->start = start;
> +	it->last = end;
> +
> +	interval_tree_insert(it, &oe->cache.rb);
> +
> +	return 0;
> +}
> +
> +int free_event_put(struct ordered_events *oe, struct ordered_event *event)
> +{
> +	if (!free_event_list)
> +		return free_event_put_tree(oe, event);
> +
> +	list_move(&event->list, &oe->cache.list);
> +	return 0;
> +}
> +
> +static inline void free_event_init(struct ordered_events *oe)
> +{
> +	if (!free_event_list)
> +		oe->cache.rb = RB_ROOT_CACHED;
> +	else
> +		INIT_LIST_HEAD(&oe->cache.list);
> +}
> +
>  #define MAX_SAMPLE_BUFFER	(64 * 1024 / sizeof(struct ordered_event))
>  static struct ordered_event *alloc_event(struct ordered_events *oe,
>  					 union perf_event *event)
>  {
> -	struct list_head *cache = &oe->cache;
>  	struct ordered_event *new = NULL;
>  	union perf_event *new_event;
>  	size_t size;
> @@ -134,13 +284,22 @@ static struct ordered_event *alloc_event(struct ordered_events *oe,
>  	 *
>  	 * Removal of ordered event object moves it from events to
>  	 * the cache list.
> +	 *
> +	 * The order of entries on the cache list is crucial for
> +	 * performance with large numbers of events. Events can be freed
> +	 * in essentially any order which means that free entries must
> +	 * be kept sorted on insertion to the cache so that subsequent
> +	 * allocation in alloc_event() returns contiguous addresses.
> +	 * Internally objects are sorted using interval trees: see
> +	 * free_event_put_tree().
>  	 */
>  	size = sizeof(*oe->buffer) + MAX_SAMPLE_BUFFER * sizeof(*new);
>  
> -	if (!list_empty(cache)) {
> -		new = list_entry(cache->next, struct ordered_event, list);
> -		list_del_init(&new->list);
> -	} else if (oe->buffer) {
> +	new = free_event_get(oe);
> +	if (new)
> +		goto out;
> +
> +	if (oe->buffer) {
>  		new = &oe->buffer->event[oe->buffer_idx];
>  		if (++oe->buffer_idx == MAX_SAMPLE_BUFFER)
>  			oe->buffer = NULL;
> @@ -164,6 +323,7 @@ static struct ordered_event *alloc_event(struct ordered_events *oe,
>  		return NULL;
>  	}
>  
> +out:
>  	new->event = new_event;
>  	return new;
>  }
> @@ -185,7 +345,7 @@ ordered_events__new_event(struct ordered_events *oe, u64 timestamp,
>  
>  void ordered_events__delete(struct ordered_events *oe, struct ordered_event *event)
>  {
> -	list_move(&event->list, &oe->cache);
> +	free_event_put(oe, event);
>  	oe->nr_events--;
>  	free_dup_event(oe, event->event);
>  	event->event = NULL;
> @@ -361,8 +521,8 @@ void ordered_events__init(struct ordered_events *oe, ordered_events__deliver_t d
>  			  void *data)
>  {
>  	INIT_LIST_HEAD(&oe->events);
> -	INIT_LIST_HEAD(&oe->cache);
>  	INIT_LIST_HEAD(&oe->to_free);
> +	free_event_init(oe);
>  	oe->max_alloc_size = (u64) -1;
>  	oe->cur_alloc_size = 0;
>  	oe->deliver	   = deliver;
> diff --git a/tools/perf/util/ordered-events.h b/tools/perf/util/ordered-events.h
> index 0920fb0ec6cc..51c4383f7f88 100644
> --- a/tools/perf/util/ordered-events.h
> +++ b/tools/perf/util/ordered-events.h
> @@ -3,6 +3,7 @@
>  #define __ORDERED_EVENTS_H
>  
>  #include <linux/types.h>
> +#include <linux/interval_tree.h>
>  
>  struct perf_sample;
>  
> @@ -39,7 +40,15 @@ struct ordered_events {
>  	u64				 max_alloc_size;
>  	u64				 cur_alloc_size;
>  	struct list_head		 events;
> -	struct list_head		 cache;
> +	union {
> +		/*
> +		 * Only one of these can be used for the free event
> +		 * cache per session. Interval trees are used by default
> +		 * (using augmented red-black trees).
> +		 */
> +		struct rb_root_cached	 rb;
> +		struct list_head 	 list;
> +	} cache;
>  	struct list_head		 to_free;
>  	struct ordered_events_buffer	*buffer;
>  	struct ordered_event		*last;
> @@ -74,4 +83,9 @@ void ordered_events__set_copy_on_queue(struct ordered_events *oe, bool copy)
>  {
>  	oe->copy_on_queue = copy;
>  }
> +
> +extern bool free_event_list;
> +struct ordered_event *free_event_get(struct ordered_events *oe);
> +int free_event_put(struct ordered_events *oe, struct ordered_event *event);

Some coding style nits, we prefer to use:

	int ordered_events__free_event_put()

As this operates on a 'struct ordered_events', static functions mostly
gert a pass at not having that prefix, but my preference is for them to
also have, as that helps with backtraces and also with ctags/cscope.

Perhaps this will be better as:

  extern bool ordered_events__free_list;
  struct orderef_event *ordered_events__get_event(struct ordered_events *oe);
  int ordered_events__put_event(struct ordered_events *oe, struct ordered_event *event);

We can't use ordered_events__get() as the is the idiom for getting a
reference count on the 'struct ordered_events' object.

I can fix while applying if this is the only issue that comes out of
review.

It looks good, reuses code/ideas tried and tested in the kernel sources,
comes with a thorough 'perf test' entry and improves performance of
perf, great.

- Arnaldo

> +
>  #endif /* __ORDERED_EVENTS_H */
> -- 
> 2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ