lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2127583772.183982.1424966563927.JavaMail.zimbra@efficios.com>
Date:	Thu, 26 Feb 2015 16:02:43 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Andi Kleen <ak@...ux.intel.com>, Andi Kleen <andi@...stfloor.org>,
	x86@...nel.org, linux-kernel@...r.kernel.org, oleg@...hat.com,
	paulmck@...ux.vnet.ibm.com, rusty@...tcorp.com.au, mingo@...nel.org
Subject: Re: [RFC][PATCH] module: Optimize __module_address() using a
 latched RB-tree

----- Original Message -----
> From: "Peter Zijlstra" <peterz@...radead.org>
> To: "Andi Kleen" <ak@...ux.intel.com>
> Cc: "Andi Kleen" <andi@...stfloor.org>, x86@...nel.org, linux-kernel@...r.kernel.org, "mathieu desnoyers"
> <mathieu.desnoyers@...icios.com>, oleg@...hat.com, paulmck@...ux.vnet.ibm.com, rusty@...tcorp.com.au,
> mingo@...nel.org
> Sent: Thursday, February 26, 2015 6:43:09 AM
> Subject: [RFC][PATCH] module: Optimize __module_address() using a latched RB-tree
> 
> On Mon, Feb 23, 2015 at 09:43:40AM -0800, Andi Kleen wrote:
> 
> > BTW if you really worry about perf overhead you could
> > gain far more (in some cases ms) by applying
> > http://comments.gmane.org/gmane.linux.kernel/1805207
> 
> 
> Yeah, how about something like so instead; it would work for everybody.
> 
> It might maybe do with a few more comments.. also it appears to work,
> but I'm barely awake so who knows.
> 
> ---
> Subject: module: Optimize __module_address() using a latched RB-tree
> 
> Optimize __module_address() using a latched RB-tree.
> 
> Currently __module_address() is using a linear search through all
> modules in order to find the module corresponding to the provided
> address. With a lot of modules this can take a lot of time.
> 
> One of the users of this is __kernel_text_address() which is employed in
> many stack unwinders; which in turn are used by perf-callchain (possibly
> from NMI context).
> 
> So by optimizing __module_address() we optimize many stack unwinders
> which are used by both perf and tracing in performance sensitive code.

It looks like a good re-use of the raw_write_seqcount_latch
API we introduced for nmi-safe clock monotonic recently. This
latch can indeed be applied to arbitrary data structures, as long
as doubling the memory usage is not an issue.

Comment below,

> 
> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Cc: Oleg Nesterov <oleg@...hat.com>
> Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
> Cc: Rusty Russell <rusty@...tcorp.com.au>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
>  include/linux/module.h |    8 ++
>  kernel/module.c        |  157
>  +++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 160 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/include/linux/module.h
> ===================================================================
> --- linux-2.6.orig/include/linux/module.h	2015-02-26 10:51:31.750869653 +0100
> +++ linux-2.6/include/linux/module.h	2015-02-26 10:52:15.895872761 +0100
> @@ -17,6 +17,7 @@
>  #include <linux/moduleparam.h>
>  #include <linux/jump_label.h>
>  #include <linux/export.h>
> +#include <linux/rbtree.h>
>  
>  #include <linux/percpu.h>
>  #include <asm/module.h>
> @@ -210,12 +211,19 @@
>  	MODULE_STATE_UNFORMED,	/* Still setting it up. */
>  };
>  
> +struct module_node {
> +	struct module *mod;
> +	struct rb_node node;
> +};
> +
>  struct module {
>  	enum module_state state;
>  
>  	/* Member of list of modules */
>  	struct list_head list;
>  
> +	struct module_node tree_node[4];
> +
>  	/* Unique handle for this module */
>  	char name[MODULE_NAME_LEN];
>  
> Index: linux-2.6/kernel/module.c
> ===================================================================
> --- linux-2.6.orig/kernel/module.c	2015-02-26 10:51:31.750869653 +0100
> +++ linux-2.6/kernel/module.c	2015-02-26 12:19:42.766743460 +0100
> @@ -102,6 +102,180 @@
>  DEFINE_MUTEX(module_mutex);
>  EXPORT_SYMBOL_GPL(module_mutex);
>  static LIST_HEAD(modules);
> +
> +
> +/*
> + * Use a latched RB-tree for __module_address()
> + *
> + * The latch concept is a multi-value concurrency data structure which
> allows
> + * unserialized access and guarantees at least one version is stable.
> + *
> + * It is employed here to optimize __module_address(), which needs to find
> the
> + * module (if any) associated with an address. Such questions are best
> answered
> + * using a binary search tree.
> + *
> + * Since modules use non-overlapping memory ranges we can use a regular
> RB-tree
> + * (as opposed to the interval tree). However, BSTs cannot be iterated while
> + * modified.
> + *
> + * To overcome this we use the latched RB-tree, it basically consists of two
> + * RB-trees which are modified in order, ensuring one is always consistent.
> + *
> + * Things are somewhat more complicated because there are two ranges per
> + * module, but other than that its pretty straight forward.
> + *
> + * idx
> + *  0	- latch 0, init
> + *  1	- latch 1, init
> + *  2	- latch 0, core
> + *  3	- latch 1, core
> + */
> +
> +struct latch_tree_root {
> +	seqcount_t	seq;
> +	struct rb_root	tree[2];
> +};
> +
> +static unsigned long mod_value(struct module *mod, int idx)
> +{
> +	if (idx & 2) {		/* core */
> +		return (unsigned long)mod->module_core;
> +	} else {		/* init */
> +		return (unsigned long)mod->module_init;
> +	}
> +}
> +
> +static struct module *mod_entry(struct rb_node *n)
> +{
> +	struct module_node *mn = container_of(n, struct module_node, node);
> +	return mn->mod;
> +}
> +
> +static int mod_node_idx(struct module *m, struct rb_node *n)
> +{
> +	struct module_node *mn = container_of(n, struct module_node, node);
> +	int idx = mn - m->tree_node;
> +
> +	BUG_ON(mn->mod != m);
> +	BUG_ON((unsigned)idx > 3);
> +
> +	return idx;
> +}
> +
> +static void __tree_insert(struct latch_tree_root *mod_tree, struct module
> *mod, int idx)
> +{
> +	struct rb_root *root = &mod_tree->tree[idx & 1];
> +	struct module_node *mn = &mod->tree_node[idx];
> +	struct rb_node **link = &root->rb_node;
> +	struct rb_node *parent = NULL;
> +	unsigned long mod_val, m_val;
> +	struct module *m;
> +	int i;
> +
> +	mn->mod = mod;
> +	mod_val = mod_value(mod, idx);
> +
> +	while (*link) {
> +		parent = *link;
> +		m = mod_entry(parent);
> +		i = mod_node_idx(m, parent);
> +		m_val = mod_value(m, i);
> +
> +		if (mod_val < m_val)
> +			link = &parent->rb_left;
> +		else
> +			link = &parent->rb_right;
> +	}
> +
> +	rb_link_node(&mn->node, parent, link);
> +	rb_insert_color(&mn->node, root);
> +}
> +
> +static void __tree_remove(struct latch_tree_root *mod_tree, struct module
> *mod, int idx)
> +{
> +	struct rb_root *root = &mod_tree->tree[idx & 1];
> +	struct module_node *mn = &mod->tree_node[idx];
> +
> +	rb_erase(&mn->node, root);
> +}
> +

Perhaps you could use mod_value() below, and introduce a
"mod_size()" too. This would keep the init vs core selection
out of the traversal code.

> +static struct module *__tree_find(struct rb_root *r, unsigned long addr)
> +{
> +	struct rb_node *n = r->rb_node;
> +
> +	while (n) {
> +		struct module *m = mod_entry(n);
> +		int idx = mod_node_idx(m, n);

And change for this ?

		unsigned long m_addr = mod_value(m, n);
		unsigned long m_size = mod_size(m, n);

		if (addr < m_addr) {
			n = n->rb_left;
		else if (addr >= m_addr + m_size)
			n = n->rb_right;
		else
			return m;

> +		if (idx & 2) {				/* core */
> +			if (addr < (unsigned long)m->module_core)
> +				n = n->rb_left;
> +			else if (addr >= (unsigned long)m->module_core + m->core_size)
> +				n = n->rb_right;
> +			else
> +				return m;
> +		} else {				/* init */
> +			if (addr < (unsigned long)m->module_init)
> +				n = n->rb_left;
> +			else if (addr >= (unsigned long)m->module_init + m->init_size)
> +				n = n->rb_right;
> +			else
> +				return m;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +static struct latch_tree_root mod_tree;

Is it customary to define static variables in the
middle of a C file ?

The rest looks good, especially for use of the latch.
I'd be tempted to turn "0, 1, 2, 3" into an enum though,
so we can follow in the code what each of those array
entry really means.

Thanks!

Mathieu

> +
> +static void mod_tree_insert(struct module *mod)
> +{
> +	raw_write_seqcount_latch(&mod_tree.seq);
> +	if (mod->init_size)
> +		__tree_insert(&mod_tree, mod, 0);	/* init */
> +	__tree_insert(&mod_tree, mod, 2);		/* core */
> +	raw_write_seqcount_latch(&mod_tree.seq);
> +	if (mod->init_size)
> +		__tree_insert(&mod_tree, mod, 1);	/* init */
> +	__tree_insert(&mod_tree, mod, 3);		/* core */
> +}
> +
> +static void mod_tree_remove_init(struct module *mod)
> +{
> +	raw_write_seqcount_latch(&mod_tree.seq);
> +	if (mod->init_size)
> +		__tree_remove(&mod_tree, mod, 0);	/* init */
> +	raw_write_seqcount_latch(&mod_tree.seq);
> +	if (mod->init_size)
> +		__tree_remove(&mod_tree, mod, 1);	/* init */
> +}
> +
> +static void mod_tree_remove(struct module *mod)
> +{
> +	raw_write_seqcount_latch(&mod_tree.seq);
> +	if (mod->init_size)
> +		__tree_remove(&mod_tree, mod, 0);	/* init */
> +	__tree_remove(&mod_tree, mod, 2);		/* core */
> +	raw_write_seqcount_latch(&mod_tree.seq);
> +	if (mod->init_size)
> +		__tree_remove(&mod_tree, mod, 1);	/* init */
> +	__tree_remove(&mod_tree, mod, 3);		/* core */
> +}
> +
> +static struct module *mod_tree_find(unsigned long addr)
> +{
> +	struct module *m;
> +	unsigned int seq;
> +
> +	do {
> +		seq = raw_read_seqcount(&mod_tree.seq);
> +		m = __tree_find(&mod_tree.tree[seq & 1], addr);
> +	} while (read_seqcount_retry(&mod_tree.seq, seq));
> +
> +	return m;
> +}
> +
>  #ifdef CONFIG_KGDB_KDB
>  struct list_head *kdb_modules = &modules; /* kdb needs the list of modules
>  */
>  #endif /* CONFIG_KGDB_KDB */
> @@ -1854,6 +2028,7 @@
>  	mutex_lock(&module_mutex);
>  	/* Unlink carefully: kallsyms could be walking list. */
>  	list_del_rcu(&mod->list);
> +	mod_tree_remove(mod);
>  	/* Remove this module from bug list, this uses list_del_rcu */
>  	module_bug_cleanup(mod);
>  	/* Wait for RCU synchronizing before releasing mod->list and buglist. */
> @@ -3098,6 +3273,7 @@
>  	mod->symtab = mod->core_symtab;
>  	mod->strtab = mod->core_strtab;
>  #endif
> +	mod_tree_remove_init(mod);
>  	unset_module_init_ro_nx(mod);
>  	module_arch_freeing_init(mod);
>  	mod->module_init = NULL;
> @@ -3168,6 +3344,7 @@
>  		goto out;
>  	}
>  	list_add_rcu(&mod->list, &modules);
> +	mod_tree_insert(mod);
>  	err = 0;
>  
>  out:
> @@ -3370,6 +3547,7 @@
>  	mutex_lock(&module_mutex);
>  	/* Unlink carefully: kallsyms could be walking list. */
>  	list_del_rcu(&mod->list);
> +	mod_tree_remove(mod);
>  	wake_up_all(&module_wq);
>  	/* Wait for RCU synchronizing before releasing mod->list. */
>  	synchronize_rcu();
> @@ -3810,13 +3988,13 @@
>  	if (addr < module_addr_min || addr > module_addr_max)
>  		return NULL;
>  
> -	list_for_each_entry_rcu(mod, &modules, list) {
> +	mod = mod_tree_find(addr);
> +	if (mod) {
> +		BUG_ON(!within_module(addr, mod));
>  		if (mod->state == MODULE_STATE_UNFORMED)
> -			continue;
> -		if (within_module(addr, mod))
> -			return mod;
> +			mod = NULL;
>  	}
> -	return NULL;
> +	return mod;
>  }
>  EXPORT_SYMBOL_GPL(__module_address);
>  
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ