lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <82251fd64c072f8f2b6d13057a744ca949bf04bb.camel@mailbox.org>
Date:   Mon, 24 Apr 2023 09:34:36 -0400
From:   liuwf <liuwf@...lbox.org>
To:     linux-kernel@...r.kernel.org
Subject: Library: [RFC PATCH] btree_blue - a btree with linear travesal
 function and more

>From 10ca8a2c319c743561020f578308da136e63e916 Mon Sep 17 00:00:00 2001
From: Liu Weifeng 4019c26b5c0d5f6b <Liuwf@...lbox.org>
Date: Mon, 24 Apr 2023 18:40:39 -0400


btree_blue is a simple btree whose algorithm implementation comes from 
Joern Engel's lib/btree. btree_blue extended lib/btree with several features andĀ 
optimizations:

	Add two-way links between leaf-nodes to offer a fast linear traversal.
	
	Insert, Search, Delete are relatively fast comparing with lib/btree, rbtee.
	Even if choosing a big node size(4K) btree_blue is still a bit faster than 
	other trees.
	
	The value of key itself can use a 0 value.
	
	Nodes can be variable in size 256, 512, 1024, 2K, 4K, or 192, 384 ...
	
	Key length has sizes of one long, two long, four long (32-bit OS).


Joern Engel's lib/btree represents an elegance in logical and is fast and 
effective(100% slots usage). When I decided to add features based on it
like two-way links between leaf-nodes, bigger node sizes (and more,in theĀ 
feature) which are required in many practices, I found it is a challenge 
to keep speed and slots usage with lib/btree, so I have to give several 
optimizings to keep or get more(hopefully) performance. 

Here are results of a simple test, the results may be unfair for some objects, 
for example, maple tree is RCU-safe and can deal with many complex cases, 
rbtree support duplicated keys, lib/btree's grace has virtue itself ....

For the fast linear traversal, I have not give a contrast test, and I think 
results would not be on an order of magnitude over those who have not direct 
links between nodes.


1. btree_blue with a faster plain byte comparing version

btree_blue use node size of 512 bytes in the first part of test, leaf node 
has slots usage of 15/16 (can be higher like 31/32 in 1k node size ...), 
non-leaf nodes keep 100% slots usage.

--------------------------------------------------------------------------------
# 1000000 inserts to a empty tree ...

btree_blue is faster than 29% (btree),	130% (rbtree),	313% (maple tree)

maple tree 1000000 inserts use time: 	989232430 ns
rbtree 1000000 inserts use time: 	552244351 ns
btree 1000000 inserts use time: 	310080508 ns
btree_blue 1000000 inserts use time: 	238926066 ns


# 1000000 searches ...

btree_blue, btree and maple_tree got almost the same rate, and are all faster 
nealy 100% than rbtree.

maple tree 1000000 searches use time: 	241619833 ns
rbtree 1000000 searches use time: 	421757078 ns
btree 1000000 searches use time: 	232226540 ns
btree_blue 1000000 searches use time: 	220941629 ns


# 1000000 mixed insert + delete based on a tree which has 1000000 nodes ...

btree_blue is faster than 28% (btree),	132% (rbtree),	307% (maple_tree)

maple tree 1000000 mixed insert + delete use time: 	2321929510 ns
rbtree 1000000 mixed insert + delete use time: 		1325037376 ns
btree 1000000 mixed insert + delete use time: 		 734110187 ns
btree_blue 1000000 mixed insert + delete use time: 	 569514235 ns


# delete a tree which has 1000000 nodes to empty ...

btree_blue is faster than 34% (btree),	117% (rbtree),	613% (maple_tree)

maple tree 1000000 deletes use time: 	1606556096 ns
rbtree 1000000 deletes use time: 		 489245700 ns
btree 1000000 deletes use time: 		 301982693 ns
btree_blue 1000000 deletes use time: 	 224875145 ns



btree_blue uses a big node size of 4096 bytes

# 1000000 inserts to a empty tree ...

maple tree 1000000 inserts use time: 	1006161194 ns
rbtree 1000000 inserts use time: 	582635990 ns
btree 1000000 inserts use time: 	311212510 ns
btree_blue 1000000 inserts use time: 	323870887 ns



# 1000000 searches ...

maple tree 1000000 searches use time: 	260520956 ns
rbtree 1000000 searches use time: 	459984150 ns
btree 1000000 searches use time: 	242243634 ns
btree_blue 1000000 searches use time: 	207793404 ns


# 1000000 mixed insert + delete based on a tree which has 1000000 nodes ...

maple tree 1000000 mixed insert + delete use time: 		2316355888 ns
rbtree 1000000 mixed insert + delete use time: 			1499731734 ns
btree 1000000 mixed insert + delete use time: 			 761392377 ns
btree_blue 1000000 mixed insert + delete use time: 		 737142324 ns


# delete a tree which has 1000000 nodes to empty ...

maple tree 1000000 deletes use time: 				1616467751 ns
rbtree 1000000 deletes use time: 				 570675402 ns
btree 1000000 deletes use time: 				 299528612 ns
btree_blue 1000000 deletes use time: 				 226293835 ns



2 btree_blue with binary search version, btree_blue uses node size of 512 bytes
--------------------------------------------------------------------------------

# 1000000 inserts to a empty tree ...

maple tree 1000000 inserts use time: 	1007482573 ns
rbtree 1000000 inserts use time: 	 552520109 ns
btree 1000000 inserts use time: 	 311470955 ns
btree_blue 1000000 inserts use time: 	 305165271 ns


# 1000000 searches ...

maple tree 1000000 searches use time: 	248842879 ns
rbtree 1000000 searches use time: 	459954315 ns
btree 1000000 searches use time: 	239902063 ns
btree_blue 1000000 searches use time: 	251252337 ns


# 1000000 mixed insert + delete based on a tree which has 1000000 nodes ...

maple tree 1000000 mixed insert + delete use time: 	2279468017 ns
rbtree 1000000 mixed insert + delete use time: 		1567305915 ns
btree 1000000 mixed insert + delete use time: 		 742089882 ns
btree_blue 1000000 mixed insert + delete use time: 	 679494846 ns


# delete a tree which has 1000000 nodes to empty ...

maple tree 1000000 deletes use time: 	1630034161 ns
rbtree 1000000 deletes use time: 	 561434384 ns
btree 1000000 deletes use time: 	 317263337 ns
btree_blue 1000000 deletes use time: 	 252872063 ns



# btree_blue uses node size of 4096 bytes

# 1000000 inserts to a empty tree ...

maple tree 1000000 inserts use time: 	1013781694 ns
rbtree 1000000 inserts use time: 	 549275685 ns
btree 1000000 inserts use time: 	 314653894 ns
btree_blue 1000000 inserts use time: 	 395352766 ns


# 1000000 searches ...

maple tree 1000000 searches use time: 	256280986 ns
rbtree 1000000 searches use time: 	453439598 ns
btree 1000000 searches use time: 	243705745 ns
btree_blue 1000000 searches use time: 	241042481 ns


# 1000000 mixed insert + delete based on a tree which has 1000000 nodes ...

maple tree 1000000 mixed insert + delete use time: 	2319430653 ns
rbtree 1000000 mixed insert + delete use time: 		1440130422 ns
btree 1000000 mixed insert + delete use time: 		 742443694 ns
btree_blue 1000000 mixed insert + delete use time: 	 886005615 ns


# delete a tree which has 1000000 nodes to empty ...

maple tree 1000000 deletes use time: 		1618646788 ns
rbtree 1000000 deletes use time: 		 554263247 ns
btree 1000000 deletes use time: 		 314184624 ns
btree_blue 1000000 deletes use time: 		 258111208 ns



Signed-off-by: Liu Weifeng 4019c26b5c0d5f6b <Liuwf@...lbox.org>
---
 include/linux/btree_blue.h |  87 ++++
 lib/Kconfig                |   4 +
 lib/Makefile               |   1 +
 lib/btree_blue.c           | 971 +++++++++++++++++++++++++++++++++++++
 4 files changed, 1063 insertions(+)
 create mode 100644 include/linux/btree_blue.h
 create mode 100644 lib/btree_blue.c

diff --git a/include/linux/btree_blue.h b/include/linux/btree_blue.h
new file mode 100644
index 000000000000..dfa185d8893b
--- /dev/null
+++ b/include/linux/btree_blue.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef BTREE_BLUE_H
+#define BTREE_BLUE_H
+
+#include <linux/kernel.h>
+#include <linux/mempool.h>
+
+
+#define MAX_TREE_HEIGHT		8
+#define MIN_SLOTS_NUMBER	16
+#define MIN_NODE_SIZE		(MIN_SLOTS_NUMBER * 2 * sizeof(unsigned long))
+
+
+#define GET_PREV		0
+#define GET_NEXT		1
+
+struct btree_blue_head;
+struct btree_blue_node_cb;
+
+
+struct btree_blue_head {
+	unsigned long *node;
+	mempool_t *mempool;
+	
+	u16 node_size;
+	u16 stub_base;
+	u8 keylen;
+	u8 slot_width;
+	u8 height;
+	u8 reserved[1];
+
+	u16 vols[MAX_TREE_HEIGHT + 1];
+};
+
+void *btree_blue_alloc(gfp_t gfp_mask, void *pool_data);
+
+void btree_blue_free(void *element, void *pool_data);
+
+int __must_check btree_blue_init(struct btree_blue_head *head, 
+								 int node_size_in_byte, 
+								 int key_len_in_byte, int flags);
+
+void btree_blue_destroy(struct btree_blue_head *head);
+
+void *btree_blue_lookup(struct btree_blue_head *head, unsigned long *key);
+
+int __must_check btree_blue_insert(struct btree_blue_head *head, 
+								   unsigned long *key, void *val, gfp_t gfp);
+
+int btree_blue_update(struct btree_blue_head *head, unsigned long *key, 
+					  void *val);
+
+void *btree_blue_remove(struct btree_blue_head *head, unsigned long *key);
+
+void* btree_blue_first(struct btree_blue_head *head,  unsigned long * __key);
+void* btree_blue_last(struct btree_blue_head *head,  unsigned long * __key);
+
+void* btree_blue_get_prev(struct btree_blue_head *head, unsigned long * __key);
+void* btree_blue_get_next(struct btree_blue_head *head, unsigned long * __key);
+
+typedef bool (*btree_blue_traverse_FN_T) (const unsigned long *key, 
+										  const void *val);
+
+/*
+ * Visit each key-value pair started from the @key and continue toward 
+ * @prev_or_next until the last or fisrt. 
+ *
+ * IF @key is given NULL, visit starts with the last(the biggest) key and walk 
+ * toward the smallest.
+ *
+ * @prev_or_next, bool value to specify the visit direction.
+ *
+ * @callback. Your function that is called in the visit loop with each key-value
+ * visited.
+ * If a function like : bool myfunc(const unsigned long *key, const void *val) 
+ * is given to the param @callback, you will see every *key and *val from the 
+ * start @key(included, and it's value). Your function's return value of 0 
+ * indicates to continue visit and 1 to exit the loop.
+ *
+ * */
+size_t btree_blue_traverse_from_key(struct btree_blue_head *head, 
+									unsigned long *key, 
+									btree_blue_traverse_FN_T callback, 
+									bool prev_or_next);
+
+#endif
+
diff --git a/lib/Kconfig b/lib/Kconfig
index 9bbf8a4b2108..a0e185ddf389 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -464,6 +464,10 @@ config TEXTSEARCH_FSM
 
 config BTREE
 	bool
+	
+config BTREE_BLUE
+	tristate
+	default m
 
 config INTERVAL_TREE
 	bool
diff --git a/lib/Makefile b/lib/Makefile
index 59bd7c2f793a..fe37f467d219 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -154,6 +154,7 @@ obj-$(CONFIG_TRACE_MMIO_ACCESS) += trace_readwrite.o
 obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
 
 obj-$(CONFIG_BTREE) += btree.o
+obj-$(CONFIG_BTREE_BLUE) += btree_blue.o
 obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o
 obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o
 obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
diff --git a/lib/btree_blue.c b/lib/btree_blue.c
new file mode 100644
index 000000000000..d604a94d0fb4
--- /dev/null
+++ b/lib/btree_blue.c
@@ -0,0 +1,971 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/btree_blue.h>
+#include <linux/cache.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+
+
+#define LONG_PER_U64	(64 / BITS_PER_LONG)
+#define MAX_KEYLEN	(2 * LONG_PER_U64)
+#define VALUE_LEN	(sizeof(unsigned long))
+
+struct btree_blue_slots_info {
+	#if (BITS_PER_LONG == 32)
+		u16 slots_nr;
+		u16 offset;
+	#else
+		u16 slots_nr;
+		u16 offset;
+		u16 reserved_a;
+		u16 reserved_b;
+	#endif
+};
+
+struct btree_blue_node_cb {
+	struct btree_blue_slots_info slots_info;
+	unsigned long slots_base[];
+};
+
+struct btree_blue_stub {
+	unsigned long *prev;
+	unsigned long *next;
+};
+
+
+static struct kmem_cache *btree_blue_cachep;
+
+void *btree_blue_alloc(gfp_t gfp_mask, void *pool_data)
+{
+	return kmem_cache_alloc(btree_blue_cachep, gfp_mask);
+}
+EXPORT_SYMBOL_GPL(btree_blue_alloc);
+
+void btree_blue_free(void *element, void *pool_data)
+{
+	kmem_cache_free(btree_blue_cachep, element);
+}
+EXPORT_SYMBOL_GPL(btree_blue_free);
+
+
+static unsigned long *btree_blue_node_alloc(struct btree_blue_head *head, gfp_t gfp)
+{
+	unsigned long *node;
+
+	node = mempool_alloc(head->mempool, gfp);
+	if (likely(node))
+		memset(node, 0, head->node_size);
+	return node;
+}
+
+static int longcmp(const unsigned long *l1, const unsigned long *l2, size_t n)
+{
+	size_t i;
+
+	for (i = 0; i < n; i++) {
+		if (l1[i] < l2[i])
+			return -1;
+		if (l1[i] > l2[i])
+			return 1;
+	}
+	return 0;
+}
+
+static unsigned long *longcpy(unsigned long *dest, const unsigned long *src, size_t n)
+{
+	size_t i;
+
+	for (i = 0; i < n; i++)
+		dest[i] = src[i];
+	return dest;
+}
+
+static unsigned long* bkey(struct btree_blue_head *head, struct btree_blue_node_cb *cb, int n)
+{
+	return cb->slots_base + (cb->slots_info.offset + n) * head->slot_width + 1;
+}
+
+static void* bval(struct btree_blue_head *head, struct btree_blue_node_cb *cb, int n)
+{
+	return (void *)(cb->slots_base[(cb->slots_info.offset + n) * head->slot_width]);
+}
+
+static void setkey(struct btree_blue_head *head, struct btree_blue_node_cb *cb, int n, unsigned long *key)
+{
+	longcpy(bkey(head, cb, n), key, head->keylen);
+}
+
+static void setval(struct btree_blue_head *head, struct btree_blue_node_cb *cb, int n, void *val)
+{
+	cb->slots_base[(cb->slots_info.offset + n) * head->slot_width] = (unsigned long) val;
+}
+
+static int keycmp(struct btree_blue_head *head, struct btree_blue_node_cb *cb, int pos, unsigned long *key)
+{
+	return longcmp(bkey(head, cb, pos), key, head->keylen);
+}
+
+int getpos(struct btree_blue_head *head, struct btree_blue_node_cb *cb, unsigned long *key)
+{
+	int i, o, x, nr, p, c;
+
+	nr = cb->slots_info.slots_nr;
+	x = nr / 8;
+
+	for (i = 1; i <= x; i++) {
+		p = i * 8 - 1;
+		c = keycmp(head, cb, p, key);
+		if (c < 0) {
+			o = (i - 1) * 8;
+			for (i = 0; i < 7; i++) {
+				p = o + i;
+				c = keycmp(head, cb, p, key);
+				if (c <= 0) {
+					return p;
+				}
+			}
+			return o + 7;
+		} else if (c == 0)
+			return p;
+	}
+	
+	o = x * 8;
+	for (i = 0; i < nr % 8; i++) {
+		p =  o + i;
+		c = keycmp(head, cb, p, key);
+		if (c <= 0)
+			return p;
+	}
+
+	return nr;
+}
+
+int geteqv(struct btree_blue_head *head, struct btree_blue_node_cb *cb, unsigned long *key)
+{
+	int i, o, x, nr, p, c;
+
+	nr = cb->slots_info.slots_nr;
+	x = nr / 8;
+
+	for (i = 1; i <= x; i++) {
+		p = i * 8 - 1;
+		c = keycmp(head, cb, p, key);
+		if (c < 0) {
+			o = (i - 1) * 8;
+			for (i = 0; i < 7; i++) {
+				p =  o + i;
+				c = keycmp(head, cb, p, key);
+				if (c != 0)
+					continue;
+				else
+					return p;
+			}
+			return nr;
+		} else if (c == 0)
+			return p;
+	}
+	
+	o = x * 8;
+	for (i = 0; i < nr % 8; i++) {
+		p =  o + i;
+		c = keycmp(head, cb, p, key);
+		if (c == 0)
+			return p;
+	}
+
+	return nr;
+}
+
+static inline struct btree_blue_stub * __get_stub(struct btree_blue_head *head, struct btree_blue_node_cb *cb)
+{
+	return (struct btree_blue_stub *)((char *)cb + head->stub_base);
+}
+
+static inline void _shift_slots(struct btree_blue_head *head, struct btree_blue_node_cb *cb,
+				int dest_slot, int src_slot, size_t slots_nr)
+{
+	unsigned long *d = cb->slots_base + (cb->slots_info.offset + dest_slot) * head->slot_width;
+	unsigned long *s = cb->slots_base + (cb->slots_info.offset + src_slot) * head->slot_width;
+
+	size_t n = slots_nr * head->slot_width * sizeof(long);
+
+	memmove(d, s, n);
+}
+
+static inline void _transfer_slots(struct btree_blue_head *head,
+			struct btree_blue_node_cb *dest,
+			struct btree_blue_node_cb *src,
+			int dest_slot, int src_slot, size_t slots_nr)
+{
+	unsigned long *d = dest->slots_base + (dest->slots_info.offset + dest_slot) * head->slot_width;
+	unsigned long *s = src->slots_base + (src->slots_info.offset + src_slot) * head->slot_width;
+
+	size_t n = slots_nr * head->slot_width * sizeof(long);
+
+	memmove(d, s, n);
+}
+
+static inline int shift_slots_on_insert(struct btree_blue_head *head, struct btree_blue_node_cb *node,
+				int pos, int level)
+{
+	int slots_nr = node->slots_info.slots_nr;
+	_shift_slots(head, node, pos + 1, pos, slots_nr - pos);
+	node->slots_info.slots_nr++;
+	return pos;
+}
+
+static inline void delete_slot(struct btree_blue_head *head, struct btree_blue_node_cb *node, 
+			int pos, int level)
+{
+	int slots_nr = node->slots_info.slots_nr;
+	_shift_slots(head, node, pos, pos + 1, slots_nr - pos - 1);
+	node->slots_info.slots_nr--;
+}
+
+
+static inline void split_to_empty(struct btree_blue_head *head, 
+				struct btree_blue_node_cb *dest, 
+				struct btree_blue_node_cb *src, 
+				int level)
+{
+	int slots_nr = src->slots_info.slots_nr / 2;
+
+	_transfer_slots(head, dest, src, 0, 0, slots_nr);
+	dest->slots_info.slots_nr += slots_nr;
+
+	_shift_slots(head, src, 0, slots_nr, src->slots_info.slots_nr - slots_nr);
+	src->slots_info.slots_nr -= slots_nr;
+}
+
+static inline void merge_nodes(struct btree_blue_head *head, 
+								struct btree_blue_node_cb *dest, 
+								struct btree_blue_node_cb *src, int level)
+{
+	int dest_nr, src_nr;
+
+	dest_nr = dest->slots_info.slots_nr;
+	src_nr = src->slots_info.slots_nr;
+
+	_transfer_slots(head, dest, src, dest_nr, 0, src_nr);
+	dest->slots_info.slots_nr += src_nr;
+}
+
+
+static int getpos_bin(struct btree_blue_head *head, struct btree_blue_node_cb *cb, 
+					  unsigned long *key)
+{
+	int l = 0;
+	int h = cb->slots_info.slots_nr;
+	int m, ret;
+
+	while (l < h) {
+		m = (l + h) / 2;
+		ret = keycmp(head, cb, m, key);
+
+		if (ret < 0)
+			h = m;
+		else if (ret > 0)
+			l = m+1;
+		else
+			return m;
+	}
+
+	return h;
+}
+
+static int geteqv_bin(struct btree_blue_head *head, 
+					  struct btree_blue_node_cb *cb, 
+					  unsigned long *key)
+{
+	int l = 0;
+	int h = cb->slots_info.slots_nr;
+	int m, ret;
+
+	while (l < h) {
+		m = (l + h) / 2;
+		ret = keycmp(head, cb, m, key);
+
+		if (ret < 0)
+			h = m;
+		else if (ret > 0)
+			l = m+1;
+		else
+			return m;
+	}
+
+	return cb->slots_info.slots_nr;
+}
+
+
+void *btree_blue_first(struct btree_blue_head *head, unsigned long * __key)
+{
+	int height = head->height;
+	struct btree_blue_node_cb *node = (struct btree_blue_node_cb *)head->node;
+
+	if (height == 0)
+		return NULL;
+
+	for ( ; height > 1; height--)
+		node = bval(head, node, node->slots_info.slots_nr - 1);
+
+	longcpy(__key, bkey(head, node, node->slots_info.slots_nr - 1), head->keylen);
+	return bval(head, node, node->slots_info.slots_nr - 1);
+}
+EXPORT_SYMBOL_GPL(btree_blue_first);
+
+void *btree_blue_last(struct btree_blue_head *head, unsigned long *__key)
+{
+	int height = head->height;
+	struct btree_blue_node_cb *node = (struct btree_blue_node_cb *)head->node;
+
+	if (height == 0)
+		return NULL;
+	for ( ; height > 1; height--)
+		node = bval(head, node, 0);
+
+	longcpy(__key, bkey(head, node, 0), head->keylen);
+	return bval(head, node, 0);
+
+}
+EXPORT_SYMBOL_GPL(btree_blue_last);
+
+static unsigned long *btree_blue_lookup_node(struct btree_blue_head *head, unsigned long *key)
+{
+	int height = head->height;
+	int pos;
+	struct btree_blue_node_cb *node = (struct btree_blue_node_cb *)head->node;
+
+	if (height == 0)
+		return NULL;
+
+	for ( ; height > 1; height--) {
+		pos = getpos(head, node, key);
+		/*pos = getpos_bin(head, node, key);*/
+		if (pos == node->slots_info.slots_nr)
+			return NULL;
+
+		node = bval(head, node, pos);
+	}
+	return (unsigned long *)node;
+}
+
+void *btree_blue_lookup(struct btree_blue_head *head, unsigned long *key)
+{
+	int i;
+	struct btree_blue_node_cb *node;
+
+	node = (struct btree_blue_node_cb *)btree_blue_lookup_node(head, key);
+	if (!node)
+		return NULL;
+
+	i = geteqv(head, node, key);
+	/*i = geteqv_bin(head, node, key);*/
+	if (i == node->slots_info.slots_nr)
+		return NULL;
+
+	return bval(head, node, i);
+}
+EXPORT_SYMBOL_GPL(btree_blue_lookup);
+
+int btree_blue_update(struct btree_blue_head *head, unsigned long *key, void *val)
+{
+	int pos;
+	struct btree_blue_node_cb *node;
+
+	node = (struct btree_blue_node_cb *)btree_blue_lookup_node(head, key);
+	if (!node)
+		return -ENOENT;
+
+	pos = geteqv(head, node, key);
+	/*pos = geteqv_bin(head, node, key);*/
+	
+	if (pos == node->slots_info.slots_nr)
+		return -ENOENT;
+
+	setval(head, node, pos, val);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(btree_blue_update);
+
+
+void *btree_blue_prev_or_next(struct btree_blue_head *head, 
+							  unsigned long *__key, int flag)
+{
+	int i;
+	struct btree_blue_node_cb *node;
+	unsigned long key[MAX_KEYLEN];
+	int slots_nr;
+
+	if (head->height == 0)
+		return NULL;
+
+	longcpy(key, __key, head->keylen);
+
+	node = (struct btree_blue_node_cb *)btree_blue_lookup_node(head, key);
+	if (!node)
+		return NULL;
+
+	slots_nr = node->slots_info.slots_nr;
+	for (i = 0; i < slots_nr; i++)
+		if (keycmp(head, node, i, key) == 0)
+			break;
+	if (i == slots_nr)
+		return NULL;
+
+	if (flag == GET_PREV) {
+		if (++i < slots_nr) {
+			longcpy(__key, bkey(head, node, i), head->keylen);
+			return bval(head, node, i);
+		} else {
+			struct btree_blue_stub *stub = __get_stub(head, node);
+			if (stub->next) {
+				node = (struct btree_blue_node_cb *)(stub->next);
+				longcpy(__key, bkey(head, node, 0), head->keylen);
+				return bval(head, node, 0);
+			} else
+				return NULL;
+		}
+	}
+
+	/* GET_NEXT  */
+	
+	if (i > 0) {
+		--i;
+		longcpy(__key, bkey(head, node, i), head->keylen);
+		return bval(head, node, i);
+	} else {
+		struct btree_blue_stub *stub = __get_stub(head, node);
+		if (stub->prev) {
+			node = (struct btree_blue_node_cb *)(stub->prev);
+			longcpy(__key, bkey(head, node, node->slots_info.slots_nr - 1), 
+					head->keylen);
+			return bval(head, node, 0);
+		} else
+			return NULL;
+	}
+}
+
+void* btree_blue_get_prev(struct btree_blue_head *head, unsigned long *__key)
+{
+	return btree_blue_prev_or_next(head, __key, GET_PREV);
+}
+EXPORT_SYMBOL_GPL(btree_blue_get_prev);
+
+void* btree_blue_get_next(struct btree_blue_head *head, unsigned long *__key)
+{
+	return btree_blue_prev_or_next(head, __key, GET_NEXT);
+}
+EXPORT_SYMBOL_GPL(btree_blue_get_next);
+
+size_t btree_blue_traverse_from_key(struct btree_blue_head *head, 
+									unsigned long *key, 
+									btree_blue_traverse_FN_T callback, 
+									bool prev_or_next)
+{
+	struct btree_blue_node_cb *node;
+	struct btree_blue_stub *stub;
+	int i, slots_nr, height;
+	bool term;
+	unsigned long found_key[MAX_KEYLEN];
+	unsigned long found_val;
+	size_t total = 0;
+
+	height = head->height;
+	if (height == 0)
+		return total;
+
+	node = (struct btree_blue_node_cb *)(head->node);
+
+	if (key == NULL) {
+		for ( ; height > 1; height--)
+			node = bval(head, node, 0);
+
+		i = 0;
+		slots_nr = node->slots_info.slots_nr;
+		longcpy(found_key, bkey(head, node, i), head->keylen);
+		found_val = (unsigned long)bval(head, node, i);
+		term = callback((const unsigned long *)(&found_key), 
+						(const void *)found_val);
+		total++;
+		if (term)
+			return total;
+		else
+			goto TRAVERSE;
+	}
+
+	node = (struct btree_blue_node_cb *)btree_blue_lookup_node(head, key);
+	if (!node)
+		return total;
+
+	slots_nr = node->slots_info.slots_nr;
+	for (i = 0; i < slots_nr; i++)
+		if (keycmp(head, node, i, (unsigned long *)key) == 0)
+			break;
+
+	if (i == slots_nr)
+		return total;
+
+	longcpy(found_key, bkey(head, node, i), head->keylen);
+	found_val = (unsigned long)bval(head, node, i);
+	term = callback((const unsigned long *)(&found_key), 
+					(const void *)found_val);
+	total++;
+	if (term)
+		return total;
+
+TRAVERSE:
+
+	if (prev_or_next == GET_PREV) {
+		i = i + 1;
+		do {
+			if (i < slots_nr) {
+				longcpy(found_key, bkey(head, node, i), head->keylen);
+				found_val = (unsigned long)bval(head, node, i);
+				term = callback((const unsigned long *)(&found_key), 
+								(const void *)found_val);
+				total++;
+				if (term)
+					return total;
+				i++;
+			} else {
+				stub = __get_stub(head, node);
+				if (stub->next) {
+					node = (struct btree_blue_node_cb *)(stub->next);
+					slots_nr = node->slots_info.slots_nr;
+					i = 0;
+				} else
+					return total;
+			}
+		} while(true);
+	}
+
+	/* GET_NEXT */
+	i = i - 1;
+	do {
+		if (i >= 0) {
+			longcpy(found_key, bkey(head, node, i), head->keylen);
+			found_val = (unsigned long)bval(head, node, i);
+			term = callback((const unsigned long *)(&found_key), 
+							(const void *)found_val);
+			total++;
+			if (term)
+				return total;
+			i--;
+		} else {
+			stub = __get_stub(head, node);
+			if (stub->prev) {
+				node = (struct btree_blue_node_cb *)(stub->prev);
+				i = node->slots_info.slots_nr - 1;
+			} else
+				return total;
+		}
+	} while(true);
+}
+EXPORT_SYMBOL_GPL(btree_blue_traverse_from_key);
+
+static unsigned long* find_level(struct btree_blue_head *head, 
+								unsigned long *key, int level,
+								struct btree_blue_node_cb **cb_p)
+{
+	struct btree_blue_node_cb *node = (struct btree_blue_node_cb *)head->node;
+	struct btree_blue_node_cb *node_p = node;
+	int height = head->height;
+	int pos;
+
+	for ( ; height > level; height--) {
+		pos = getpos(head, node, key);
+		/*pos = getpos_bin(head, node, key);*/
+		if (pos == node->slots_info.slots_nr ) {
+			/* right-most key is too large, update it */
+			/* FIXME: If the right-most key on higher levels is
+			 * always zero, this wouldn't be necessary. */
+			pos--;
+			setkey(head, node, pos, key);
+		}
+
+		BUG_ON(pos < 0);
+		node_p = node;
+		node = bval(head, node, pos);
+	}
+
+	BUG_ON(!node);
+
+	if (cb_p)
+		*cb_p = node_p;
+
+	return (unsigned long *)node;
+}
+
+static int btree_blue_grow(struct btree_blue_head *head, gfp_t gfp)
+{
+	struct btree_blue_node_cb *node, *node_h;
+
+	node = (struct btree_blue_node_cb *)btree_blue_node_alloc(head, gfp);
+	if (!node)
+		return -ENOMEM;
+
+	if (likely(head->node)) {
+		node_h = (struct btree_blue_node_cb *)head->node;
+		setkey(head, node, 0, bkey(head, node_h, 
+								   node_h->slots_info.slots_nr - 1));
+		setval(head, node, 0, head->node);
+		node->slots_info.slots_nr = 1;
+	}
+
+	head->node = (unsigned long *)node;
+	head->height++;
+
+	return 0;
+}
+
+static void btree_blue_shrink(struct btree_blue_head *head)
+{
+	struct btree_blue_node_cb *node;
+
+	if (head->height <= 1)
+		return;
+
+	node = (struct btree_blue_node_cb *)head->node;
+	BUG_ON(node->slots_info.slots_nr > 1);
+
+	head->node = bval(head, node, 0);
+	head->height--;
+	
+	mempool_free(node, head->mempool);
+}
+
+static int btree_blue_insert_level(struct btree_blue_head *head, 
+								   unsigned long *key, void *val, int level, 
+								   struct btree_blue_node_cb *found, gfp_t gfp)
+{
+	struct btree_blue_node_cb *cb, *cb_new, *cb_p;
+	int pos, slots_nr, err;
+
+	BUG_ON(!val);
+
+	if (head->height < level) {
+		err = btree_blue_grow(head, gfp);
+		if (err)
+			return err;
+
+		found = 0;
+	}
+
+//retry:
+	
+	if (found) {
+		cb = found;
+		cb_p = NULL;
+	} else
+		cb = (struct btree_blue_node_cb *)find_level(head, key, level, &cb_p);
+
+	pos = getpos(head, cb, key);
+	/*pos = getpos_bin(head, cb, key);*/
+
+	slots_nr = cb->slots_info.slots_nr;
+
+	/* two identical keys are not allowed */
+	BUG_ON(pos < slots_nr && keycmp(head, cb, pos, key) == 0);
+
+	if (slots_nr == head->vols[level]) {
+		/* need to split node */
+		struct btree_blue_node_cb *cb_prev;
+		struct btree_blue_stub *stub, *stub_new, *stub_prev;
+
+		cb_new = (struct btree_blue_node_cb *)btree_blue_node_alloc(head, gfp);
+		if (!cb_new)
+			return -ENOMEM;
+
+		err = btree_blue_insert_level(head, bkey(head, cb, slots_nr / 2 - 1),
+				cb_new, level + 1, cb_p, gfp);
+		if (err) {
+			mempool_free(cb_new, head->mempool);
+			return err;
+		}
+
+		if (level == 1) {
+			stub = __get_stub(head, cb);
+			stub_new = __get_stub(head, cb_new);
+			stub_new->next = (unsigned long *)cb;
+
+			if (stub->prev) {				
+				cb_prev = (struct btree_blue_node_cb *)(stub->prev);
+				stub_prev = __get_stub(head, cb_prev);
+				stub_prev->next = (unsigned long *)cb_new;
+				stub_new->prev = stub->prev;
+			}
+			stub->prev = (unsigned long *)cb_new;
+		}
+
+		split_to_empty(head, cb_new, cb, level);
+
+		if (pos <= (slots_nr / 2 - 1)) {
+			slots_nr = slots_nr / 2;
+			cb = cb_new;
+		} else {
+			pos = pos - slots_nr / 2;
+			slots_nr = slots_nr - slots_nr / 2;
+		}
+	}
+
+	BUG_ON(slots_nr >= head->vols[level]);
+
+	/* shift and insert */
+	pos = shift_slots_on_insert(head, cb, pos, level);
+	setkey(head, cb, pos, key);
+	setval(head, cb, pos, val);
+
+	return 0;
+}
+
+int btree_blue_insert(struct btree_blue_head *head, unsigned long *key, void *val, 
+				  gfp_t gfp)
+{
+	BUG_ON(!val);
+	return btree_blue_insert_level(head, key, val, 1, 0, gfp);
+}
+EXPORT_SYMBOL_GPL(btree_blue_insert);
+
+static void *btree_blue_remove_level(struct btree_blue_head *head, 
+									 unsigned long *key, int level);
+
+static void merge(struct btree_blue_head *head, int level,
+			struct btree_blue_node_cb *cb_left,
+			struct btree_blue_node_cb *cb_right,
+			struct btree_blue_node_cb *cb_parent,
+			int lpos)
+{
+	struct btree_blue_node_cb *cb_right_right;
+
+	struct btree_blue_stub *stub_left, *stub_right, *stub_right_right;
+
+	/* Move all keys to the left */
+	merge_nodes(head, cb_left, cb_right, level);
+
+	if (level == 1) {
+		stub_left = __get_stub(head, cb_left);
+		stub_right = __get_stub(head, cb_right);
+
+		if(stub_right->next) {
+			stub_left->next = stub_right->next;
+
+			cb_right_right = (struct btree_blue_node_cb *)(stub_right->next);
+			stub_right_right = __get_stub(head, cb_right_right);
+			stub_right_right->prev = (unsigned long *)cb_left;
+		} else
+			stub_left->next = NULL;
+	}
+
+	/* Exchange left and right child in parent */
+	setval(head, cb_parent, lpos, cb_right);
+	setval(head, cb_parent, lpos + 1, cb_left);
+	/* Remove left (formerly right) child from parent */
+	btree_blue_remove_level(head, bkey(head, cb_parent, lpos), level + 1);
+	mempool_free(cb_right, head->mempool);
+}
+
+static void rebalance(struct btree_blue_head *head, unsigned long *key, 
+					  int level, 
+					  struct btree_blue_node_cb *cb_child, 
+					  struct btree_blue_node_cb *cb_p)
+{
+	struct btree_blue_node_cb *cb_parent, *cb_left, *cb_right;
+	struct btree_blue_stub *stub_child, *stub_left, *stub_right;
+	int i;
+	int slots_nr, slots_nr_left, slots_nr_right;
+
+	slots_nr = cb_child->slots_info.slots_nr;
+
+	if (slots_nr == 0) {
+		/* Because we don't steal entries from a neighbour, this case
+		 * can happen.  Parent node contains a single child, this
+		 * node, so merging with a sibling never happens.
+		 */
+		btree_blue_remove_level(head, key, level + 1);
+
+		if (level == 1) {
+			stub_child = __get_stub(head, cb_child);
+			if (stub_child->prev) {
+				
+				cb_left = (struct btree_blue_node_cb *)(stub_child->prev);
+				stub_left = __get_stub(head, cb_left);
+				stub_left->next = stub_child->next ? stub_child->next : NULL;
+			}
+
+			if (stub_child->next) {
+				cb_right = (struct btree_blue_node_cb *)(stub_child->next);
+				stub_right = __get_stub(head, cb_right);
+				stub_right->prev = stub_child->prev ? stub_child->prev : NULL;
+			}
+		}
+
+		mempool_free(cb_child, head->mempool);
+		return;
+	}
+
+	cb_parent = cb_p;
+	
+	i = getpos(head, cb_parent, key);
+	/*i = getpos_bin(head, cb_parent, key);*/
+
+	BUG_ON(bval(head, cb_parent, i) != cb_child);
+	
+	if (i > 0) {
+		cb_left = bval(head, cb_parent, i - 1);
+		slots_nr_left = cb_left->slots_info.slots_nr;
+
+		if (slots_nr_left + slots_nr <= head->vols[level]) {
+			merge(head, level, cb_left, cb_child, cb_parent, i - 1);
+			return;
+		}
+	}
+
+	if (i + 1 < cb_parent->slots_info.slots_nr) {
+		cb_right = bval(head, cb_parent, i + 1);
+		
+		slots_nr_right = cb_right->slots_info.slots_nr;
+
+		if (slots_nr + slots_nr_right <= head->vols[level]) {
+			merge(head, level, cb_child, cb_right, cb_parent, i);
+			return;
+		}
+	}
+	/*
+	 * We could also try to steal one entry from the left or right
+	 * neighbor.  By not doing so we changed the invariant from
+	 * "all nodes are at least half full" to "no two neighboring
+	 * nodes can be merged".  Which means that the average fill of
+	 * all nodes is still half or better.
+	 */
+}
+
+static void *btree_blue_remove_level(struct btree_blue_head *head, 
+									 unsigned long *key, int level)
+{
+	struct btree_blue_node_cb *cb, *cb_p;
+	int pos;
+	void *ret;
+
+	if (level > head->height) {
+		/* we recursed all the way up */
+		head->height = 0;
+		head->node = NULL;
+		return NULL;
+	}
+
+	cb = (struct btree_blue_node_cb *)find_level(head, key, level, &cb_p);
+	pos = getpos(head, cb, key);
+	/*pos = getpos_bin(head, cb, key);*/
+
+	if ((level == 1) && (pos == cb->slots_info.slots_nr))
+		return NULL;
+
+	ret = bval(head, cb, pos);
+
+	/* remove and shift */
+	delete_slot(head, cb, pos, level);
+
+	if (cb->slots_info.slots_nr < head->vols[level] / 2) {
+		if (level < head->height)
+			rebalance(head, key, level, cb, cb_p);
+		else if (cb->slots_info.slots_nr == 1)
+			btree_blue_shrink(head);
+	}
+
+	return ret;
+}
+
+void *btree_blue_remove(struct btree_blue_head *head, unsigned long *key)
+{
+	if (head->height == 0)
+		return NULL;
+
+	return btree_blue_remove_level(head, key, 1);
+}
+EXPORT_SYMBOL_GPL(btree_blue_remove);
+
+static inline void __btree_blue_init(struct btree_blue_head *head, 
+									 int node_size, int keylen, int flags)
+{
+	int vol;
+
+	head->node = NULL;
+	head->height = 0;
+
+	head->node_size = node_size;
+
+	head->keylen = (keylen * 8) / BITS_PER_LONG;
+
+	head->slot_width = (VALUE_LEN * 8) / BITS_PER_LONG + head->keylen;
+
+	vol = (node_size - sizeof(struct btree_blue_node_cb)) / 
+			(head->slot_width * sizeof(long));
+	for (int i = 2; i < MAX_TREE_HEIGHT + 1; i++)
+		head->vols[i] = vol;
+	vol = (node_size - sizeof(struct btree_blue_node_cb) - 
+		sizeof(struct btree_blue_stub)) / (head->slot_width * sizeof(long));
+	head->vols[0] = head->vols[1] = vol;
+
+	head->stub_base = sizeof(struct btree_blue_node_cb) + head->vols[1] * 
+						(head->slot_width * sizeof(long));
+}
+
+int __must_check btree_blue_init(struct btree_blue_head *head, 
+								 int node_size_in_byte, int key_len_in_byte, 
+								 int flags)
+{
+	if (node_size_in_byte % L1_CACHE_BYTES)
+		return -EINVAL;
+
+	if ((node_size_in_byte < MIN_NODE_SIZE) || (node_size_in_byte > PAGE_SIZE))
+		return -EINVAL;
+
+#if defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64)
+	if ((key_len_in_byte != sizeof(unsigned long)) && 
+		(key_len_in_byte != 2 * sizeof(unsigned long)))
+		return -EINVAL;
+#else
+	if ((key_len_in_byte != sizeof(unsigned long)) &&
+		(key_len_in_byte != 2 * sizeof(unsigned long)) &&
+		(key_len_in_byte != 4 * sizeof(unsigned long)))
+		return -EINVAL;
+#endif
+
+	btree_blue_cachep = kmem_cache_create("btree_blue_node_cb", 
+										node_size_in_byte,
+										0, SLAB_HWCACHE_ALIGN, NULL);
+	if (!btree_blue_cachep)
+		return -ENOMEM;
+
+	__btree_blue_init(head, node_size_in_byte, key_len_in_byte, flags);
+
+	head->mempool = mempool_create(0, btree_blue_alloc, btree_blue_free, NULL);
+	if (!head->mempool)
+		return -ENOMEM;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(btree_blue_init);
+
+void btree_blue_destroy(struct btree_blue_head *head)
+{
+	mempool_free(head->node, head->mempool);
+	mempool_destroy(head->mempool);
+	head->mempool = NULL;
+}
+EXPORT_SYMBOL_GPL(btree_blue_destroy);
+
+static int __init btree_blue_module_init(void)
+{
+	return 0;
+}
+
+static void __exit btree_blue_module_exit(void)
+{
+	kmem_cache_destroy(btree_blue_cachep);
+}
+
+module_init(btree_blue_module_init);
+module_exit(btree_blue_module_exit);
+
+MODULE_LICENSE("GPL");
-- 
2.30.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ