linux-kernel - Re: [PATCH 0/9] Scalability requirements for sysv ipc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080530082214.GD4943@linux.vnet.ibm.com>
Date:	Fri, 30 May 2008 01:22:14 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Nadia.Derbey@...l.net
Cc:	manfred@...orfullife.com, lnxninja@...ux.vnet.ibm.com,
	linux-kernel@...r.kernel.org, efault@....de,
	akpm@...ux-foundation.org
Subject: Re: [PATCH 0/9] Scalability requirements for sysv ipc - v3

On Wed, May 07, 2008 at 01:35:53PM +0200, Nadia.Derbey@...l.net wrote:
> 
> After scalability problems have been detected when using the sysV ipcs, I
> have proposed to use an RCU based implementation of the IDR api instead (see
> threads http://lkml.org/lkml/2008/4/11/212 and
> http://lkml.org/lkml/2008/4/29/295).
> 
> This resulted in many people asking to convert the idr API and make it
> rcu safe (because most of the code was duplicated and thus unmaintanable
> and unreviewable).
> 
> So here is a first attempt.
> 
> The important change wrt to the idr API itself is during idr removes:
> idr layers are freed after a grace period, instead of being moved to the
> free list.
> 
> The important change wrt to ipcs, is that idr_find() can now be called
> locklessly inside a rcu read critical section.
> 
> Here are the results I've got for the pmsg test sent by Manfred: 
> 
>    2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
> 1         1168441           1064021       876000               947488
> 2         1094264            921059      1549592              1730685
> 3         2082520           1738165      1694370              2324880
> 4         2079929           1695521       404553              2400408
> 5         2898758            406566       391283              3246580
> 6         2921417            261275       263249              3752148
> 7         3308761            126056       191742              4243142
> 8         3329456            100129       141722              4275780
> 
> 1st column: stock 2.6.25-rc3-mm1
> 2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
> 3nd column: stock 2.6.25-mm1
> 4th column: 2.6.25-mm1 + this pacth series.
> 
> I'll send a chart as an answer to this mail: don't know how to do that
> with quilt :-(
> 
> 
> Reviewers are more than ever welcome!
> 
> Patches should be applied on linux-2.6.25-mm1, in the following order:
> 
> [ PATCH 01/09 ] : idr_add_rcu_head.patch
> [ PATCH 02/09 ] : idr_rename_routines.patch
> [ PATCH 03/09 ] : idr_fix_printk.patch
> [ PATCH 04/09 ] : idr_rc_to_errno.patch
> [ PATCH 05/09 ] : idr_get_new_rcu_safe.patch
> [ PATCH 06/09 ] : idr_find_rcu_safe.patch
> [ PATCH 07/09 ] : idr_remove_rcu_safe.patch
> [ PATCH 08/09 ] : ipc_fix_ipc_lock.patch
> [ PATCH 09/09 ] : remove_ipc_lock_down.patch
> 
> Patches 2, 3 and 4 do not introduce actual changes.
> 
> I won't be available before next Tuesday, so, please, don't be mad at me if
> I'm not answering fast enough.

I guess in my case, next Tuesday was not an issue.  :-/

Anyway, the idr.c changes look good to me.  Not sure why you are using
INIT_RCU_HEAD() given that call_rcu() completely initializes the fields.
Using INIT_RCU_HEAD() doesn't cause any problems, but does add needless
code.

Commentary below, looks good from an RCU viewpoint.

							Thanx, Paul

> /*
>  * 2002-10-18  written by Jim Houston jim.houston@...r.com
>  *	Copyright (C) 2002 by Concurrent Computer Corporation
>  *	Distributed under the GNU GPL license version 2.
>  *
>  * Modified by George Anzinger to reuse immediately and to use
>  * find bit instructions.  Also removed _irq on spinlocks.
>  *
>  * Modified by Nadia Derbey to make it RCU safe.
>  *
>  * Small id to pointer translation service.
>  *
>  * It uses a radix tree like structure as a sparse array indexed
>  * by the id to obtain the pointer.  The bitmap makes allocating
>  * a new id quick.
>  *
>  * You call it to allocate an id (an int) an associate with that id a
>  * pointer or what ever, we treat it as a (void *).  You can pass this
>  * id to a user for him to pass back at a later time.  You then pass
>  * that id to this code and it returns your pointer.
> 
>  * You can release ids at any time. When all ids are released, most of
>  * the memory is returned (we keep IDR_FREE_MAX) in a local pool so we
>  * don't need to go to the memory "store" during an id allocate, just
>  * so you don't need to be too concerned about locking and conflicts
>  * with the slab allocator.
>  */
> 
> #ifndef TEST                        // to test in user space...
> #include <linux/slab.h>
> #include <linux/init.h>
> #include <linux/module.h>
> #endif
> #include <linux/err.h>
> #include <linux/string.h>
> #include <linux/idr.h>
> 
> static struct kmem_cache *idr_layer_cache;
> 
> static struct idr_layer *get_from_free_list(struct idr *idp)
> {
> 	struct idr_layer *p;
> 	unsigned long flags;
> 
> 	spin_lock_irqsave(&idp->lock, flags);
> 	if ((p = idp->id_free)) {
> 		idp->id_free = p->ary[0];
> 		idp->id_free_cnt--;
> 		p->ary[0] = NULL;

OK, this is the freelist which is inaccessible to readers.

> 	}
> 	spin_unlock_irqrestore(&idp->lock, flags);
> 	return(p);
> }
> 
> static void idr_layer_rcu_free(struct rcu_head *head)
> {
> 	struct idr_layer *layer;
> 
> 	layer = container_of(head, struct idr_layer, rcu_head);
> 	kmem_cache_free(idr_layer_cache, layer);
> }
> 
> static inline void free_layer(struct idr_layer *p)
> {
> 	call_rcu(&p->rcu_head, idr_layer_rcu_free);
> }
> 
> /* only called when idp->lock is held */
> static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
> {
> 	p->ary[0] = idp->id_free;

OK, this is the freelist which is inaccessible to readers.

> 	idp->id_free = p;
> 	idp->id_free_cnt++;
> }
> 
> static void move_to_free_list(struct idr *idp, struct idr_layer *p)
> {
> 	unsigned long flags;
> 
> 	/*
> 	 * Depends on the return element being zeroed.
> 	 */
> 	spin_lock_irqsave(&idp->lock, flags);
> 	__move_to_free_list(idp, p);
> 	spin_unlock_irqrestore(&idp->lock, flags);
> }
> 
> static void idr_mark_full(struct idr_layer **pa, int id)
> {
> 	struct idr_layer *p = pa[0];
> 	int l = 0;
> 
> 	__set_bit(id & IDR_MASK, &p->bitmap);
> 	/*
> 	 * If this layer is full mark the bit in the layer above to
> 	 * show that this part of the radix tree is full.  This may
> 	 * complete the layer above and require walking up the radix
> 	 * tree.
> 	 */
> 	while (p->bitmap == IDR_FULL) {
> 		if (!(p = pa[++l]))
> 			break;
> 		id = id >> IDR_BITS;
> 		__set_bit((id & IDR_MASK), &p->bitmap);
> 	}
> }
> 
> /**
>  * idr_pre_get - reserver resources for idr allocation
>  * @idp:	idr handle
>  * @gfp_mask:	memory allocation flags
>  *
>  * This function should be called prior to locking and calling the
>  * idr_get_new* functions. It preallocates enough memory to satisfy
>  * the worst possible allocation.
>  *
>  * If the system is REALLY out of memory this function returns 0,
>  * otherwise 1.
>  */
> int idr_pre_get(struct idr *idp, gfp_t gfp_mask)
> {
> 	while (idp->id_free_cnt < IDR_FREE_MAX) {
> 		struct idr_layer *new;
> 		new = kmem_cache_alloc(idr_layer_cache, gfp_mask);
> 		if (new == NULL)
> 			return (0);
> 		move_to_free_list(idp, new);
> 	}
> 	return 1;
> }
> EXPORT_SYMBOL(idr_pre_get);
> 
> static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
> {
> 	int n, m, sh;
> 	struct idr_layer *p, *new;
> 	int l, id, oid;
> 	unsigned long bm;
> 
> 	id = *starting_id;
>  restart:
> 	p = idp->top;

OK, the caller presumably holds an update-side lock.

> 	l = idp->layers;
> 	pa[l--] = NULL;
> 	while (1) {
> 		/*
> 		 * We run around this while until we reach the leaf node...
> 		 */
> 		n = (id >> (IDR_BITS*l)) & IDR_MASK;
> 		bm = ~p->bitmap;
> 		m = find_next_bit(&bm, IDR_SIZE, n);
> 		if (m == IDR_SIZE) {
> 			/* no space available go back to previous layer. */
> 			l++;
> 			oid = id;
> 			id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;
> 
> 			/* if already at the top layer, we need to grow */
> 			if (!(p = pa[l])) {
> 				*starting_id = id;
> 				return IDR_NEED_TO_GROW;
> 			}
> 
> 			/* If we need to go up one layer, continue the
> 			 * loop; otherwise, restart from the top.
> 			 */
> 			sh = IDR_BITS * (l + 1);
> 			if (oid >> sh == id >> sh)
> 				continue;
> 			else
> 				goto restart;
> 		}
> 		if (m != n) {
> 			sh = IDR_BITS*l;
> 			id = ((id >> sh) ^ n ^ m) << sh;
> 		}
> 		if ((id >= MAX_ID_BIT) || (id < 0))
> 			return IDR_NOMORE_SPACE;
> 		if (l == 0)
> 			break;
> 		/*
> 		 * Create the layer below if it is missing.
> 		 */
> 		if (!p->ary[m]) {

OK, we aren't dereferencing.  Besides, we should hold the update-side
lock at this point.

> 			new = get_from_free_list(idp);
> 			if (!new)
> 				return -1;
> 			INIT_RCU_HEAD(&new->rcu_head);

Not needed, unless you want this zeroed for debug purposes.

> 			rcu_assign_pointer(p->ary[m], new);
> 			p->count++;
> 		}
> 		pa[l--] = p;
> 		p = p->ary[m];

Holding update-side lock.

> 	}
> 
> 	pa[l] = p;
> 	return id;
> }
> 
> static int idr_get_empty_slot(struct idr *idp, int starting_id,
> 			      struct idr_layer **pa)
> {
> 	struct idr_layer *p, *new;
> 	int layers, v, id;
> 	unsigned long flags;
> 
> 	id = starting_id;
> build_up:
> 	p = idp->top;

OK, the caller presumably holds an update-side lock.

> 	layers = idp->layers;
> 	if (unlikely(!p)) {
> 		if (!(p = get_from_free_list(idp)))
> 			return -1;
> 		INIT_RCU_HEAD(&p->rcu_head);

Not needed, unless you want this zeroed for debug purposes.

> 		layers = 1;
> 	}
> 	/*
> 	 * Add a new layer to the top of the tree if the requested
> 	 * id is larger than the currently allocated space.
> 	 */
> 	while ((layers < (MAX_LEVEL - 1)) && (id >= (1 << (layers*IDR_BITS)))) {
> 		layers++;
> 		if (!p->count)
> 			continue;
> 		if (!(new = get_from_free_list(idp))) {
> 			/*
> 			 * The allocation failed.  If we built part of
> 			 * the structure tear it down.
> 			 */
> 			spin_lock_irqsave(&idp->lock, flags);
> 			for (new = p; p && p != idp->top; new = p) {
> 				p = p->ary[0];
> 				new->ary[0] = NULL;

OK, this presumably has not yet been made accessible to readers.

> 				new->bitmap = new->count = 0;
> 				__move_to_free_list(idp, new);
> 			}
> 			spin_unlock_irqrestore(&idp->lock, flags);
> 			return -1;
> 		}
> 		new->ary[0] = p;

OK, this presumably has not yet been made accessible to readers.

> 		new->count = 1;
> 		INIT_RCU_HEAD(&new->rcu_head);

Not needed, unless you want this zeroed for debug purposes.

> 		if (p->bitmap == IDR_FULL)
> 			__set_bit(0, &new->bitmap);
> 		p = new;
> 	}
> 	rcu_assign_pointer(idp->top, p);
> 	idp->layers = layers;
> 	v = sub_alloc(idp, &id, pa);
> 	if (v == IDR_NEED_TO_GROW)
> 		goto build_up;
> 	return(v);
> }
> 
> static int idr_get_new_above_int(struct idr *idp, void *ptr, int starting_id)
> {
> 	struct idr_layer *pa[MAX_LEVEL];
> 	int id;
> 
> 	id = idr_get_empty_slot(idp, starting_id, pa);
> 	if (id >= 0) {
> 		/*
> 		 * Successfully found an empty slot.  Install the user
> 		 * pointer and mark the slot full.
> 		 */
> 		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
> 				(struct idr_layer *)ptr);
> 		pa[0]->count++;
> 		idr_mark_full(pa, id);
> 	}
> 
> 	return id;
> }
> 
> /**
>  * idr_get_new_above - allocate new idr entry above or equal to a start id
>  * @idp: idr handle
>  * @ptr: pointer you want associated with the ide
>  * @start_id: id to start search at
>  * @id: pointer to the allocated handle
>  *
>  * This is the allocate id function.  It should be called with any
>  * required locks.
>  *
>  * If memory is required, it will return -EAGAIN, you should unlock
>  * and go back to the idr_pre_get() call.  If the idr is full, it will
>  * return -ENOSPC.
>  *
>  * @id returns a value in the range 0 ... 0x7fffffff
>  */
> int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id)
> {
> 	int rv;
> 
> 	rv = idr_get_new_above_int(idp, ptr, starting_id);
> 	/*
> 	 * This is a cheap hack until the IDR code can be fixed to
> 	 * return proper error values.
> 	 */
> 	if (rv < 0)
> 		return _idr_rc_to_errno(rv);
> 	*id = rv;
> 	return 0;
> }
> EXPORT_SYMBOL(idr_get_new_above);
> 
> /**
>  * idr_get_new - allocate new idr entry
>  * @idp: idr handle
>  * @ptr: pointer you want associated with the ide
>  * @id: pointer to the allocated handle
>  *
>  * This is the allocate id function.  It should be called with any
>  * required locks.
>  *
>  * If memory is required, it will return -EAGAIN, you should unlock
>  * and go back to the idr_pre_get() call.  If the idr is full, it will
>  * return -ENOSPC.
>  *
>  * @id returns a value in the range 0 ... 0x7fffffff
>  */
> int idr_get_new(struct idr *idp, void *ptr, int *id)
> {
> 	int rv;
> 
> 	rv = idr_get_new_above_int(idp, ptr, 0);
> 	/*
> 	 * This is a cheap hack until the IDR code can be fixed to
> 	 * return proper error values.
> 	 */
> 	if (rv < 0)
> 		return _idr_rc_to_errno(rv);
> 	*id = rv;
> 	return 0;
> }
> EXPORT_SYMBOL(idr_get_new);
> 
> static void idr_remove_warning(int id)
> {
> 	printk(KERN_WARNING
> 		"idr_remove called for id=%d which is not allocated.\n", id);
> 	dump_stack();
> }
> 
> static void sub_remove(struct idr *idp, int shift, int id)
> {
> 	struct idr_layer *p = idp->top;

OK, the caller presumably holds an update-side lock.

> 	struct idr_layer **pa[MAX_LEVEL];
> 	struct idr_layer ***paa = &pa[0];
> 	struct idr_layer *to_free;
> 	int n;
> 
> 	*paa = NULL;
> 	*++paa = &idp->top;
> 
> 	while ((shift > 0) && p) {
> 		n = (id >> shift) & IDR_MASK;
> 		__clear_bit(n, &p->bitmap);
> 		*++paa = &p->ary[n];

OK, the caller presumably holds an update-side lock.

> 		p = p->ary[n];
> 		shift -= IDR_BITS;
> 	}
> 	n = id & IDR_MASK;
> 	if (likely(p != NULL && test_bit(n, &p->bitmap))){
> 		__clear_bit(n, &p->bitmap);
> 		rcu_assign_pointer(p->ary[n], NULL);
> 		to_free = NULL;
> 		while(*paa && ! --((**paa)->count)){
> 			if (to_free)
> 				free_layer(to_free);
> 			to_free = **paa;
> 			**paa-- = NULL;
> 		}
> 		if (!*paa)
> 			idp->layers = 0;
> 		if (to_free)
> 			free_layer(to_free);
> 	} else
> 		idr_remove_warning(id);
> }
> 
> /**
>  * idr_remove - remove the given id and free it's slot
>  * @idp: idr handle
>  * @id: unique key
>  */
> void idr_remove(struct idr *idp, int id)
> {
> 	struct idr_layer *p;
> 	struct idr_layer *to_free;
> 
> 	/* Mask off upper bits we don't use for the search. */
> 	id &= MAX_ID_MASK;
> 
> 	sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
> 	if (idp->top && idp->top->count == 1 && (idp->layers > 1) &&
> 	    idp->top->ary[0]) {

OK, the caller presumably holds the update-side lock.

> 		/*
> 		 * Single child at leftmost slot: we can shrink the tree.
> 		 * This level is not needed anymore since when layers are
> 		 * inserted, they are inserted at the top of the existing
> 		 * tree.
> 		 */
> 		to_free = idp->top;
> 		p = idp->top->ary[0];

OK, the caller presumably holds the update-side lock.

> 		rcu_assign_pointer(idp->top, p);
> 		--idp->layers;
> 		to_free->bitmap = to_free->count = 0;
> 		free_layer(to_free);
> 	}
> 	while (idp->id_free_cnt >= IDR_FREE_MAX) {
> 		p = get_from_free_list(idp);
> 		/*
> 		 * Note: we don't call the rcu callback here, since the only
> 		 * layers that fall into the freelist are those that have been
> 		 * preallocated.
> 		 */
> 		kmem_cache_free(idr_layer_cache, p);
> 	}
> 	return;
> }
> EXPORT_SYMBOL(idr_remove);
> 
> /**
>  * idr_remove_all - remove all ids from the given idr tree
>  * @idp: idr handle
>  *
>  * idr_destroy() only frees up unused, cached idp_layers, but this
>  * function will remove all id mappings and leave all idp_layers
>  * unused.
>  *
>  * A typical clean-up sequence for objects stored in an idr tree, will
>  * use idr_for_each() to free all objects, if necessay, then
>  * idr_remove_all() to remove all ids, and idr_destroy() to free
>  * up the cached idr_layers.
>  */
> void idr_remove_all(struct idr *idp)
> {
> 	int n, id, max;
> 	struct idr_layer *p;
> 	struct idr_layer *pa[MAX_LEVEL];
> 	struct idr_layer **paa = &pa[0];
> 
> 	n = idp->layers * IDR_BITS;
> 	p = idp->top;

OK, the caller presumably holds an update-side lock.

> 	max = 1 << n;
> 
> 	id = 0;
> 	while (id < max) {
> 		while (n > IDR_BITS && p) {
> 			n -= IDR_BITS;
> 			*paa++ = p;
> 			p = p->ary[(id >> n) & IDR_MASK];

OK, the caller presumably holds the update-side lock.

> 		}
> 
> 		id += 1 << n;
> 		while (n < fls(id)) {
> 			if (p)
> 				free_layer(p);
> 			n += IDR_BITS;
> 			p = *--paa;
> 		}
> 	}
> 	rcu_assign_pointer(idp->top, NULL);
> 	idp->layers = 0;
> }
> EXPORT_SYMBOL(idr_remove_all);
> 
> /**
>  * idr_destroy - release all cached layers within an idr tree
>  * idp: idr handle
>  */
> void idr_destroy(struct idr *idp)
> {
> 	while (idp->id_free_cnt) {
> 		struct idr_layer *p = get_from_free_list(idp);
> 		kmem_cache_free(idr_layer_cache, p);
> 	}
> }
> EXPORT_SYMBOL(idr_destroy);
> 
> /**
>  * idr_find - return pointer for given id
>  * @idp: idr handle
>  * @id: lookup key
>  *
>  * Return the pointer given the id it has been registered with.  A %NULL
>  * return indicates that @id is not valid or you passed %NULL in
>  * idr_get_new().
>  *
>  * This function can be called under rcu_read_lock(), given that the leaf
>  * pointers lifetimes are correctly managed.
>  */
> void *idr_find(struct idr *idp, int id)
> {
> 	int n;
> 	struct idr_layer *p;
> 
> 	n = idp->layers * IDR_BITS;
> 	p = rcu_dereference(idp->top);
> 
> 	/* Mask off upper bits we don't use for the search. */
> 	id &= MAX_ID_MASK;
> 
> 	if (id >= (1 << n))
> 		return NULL;
> 
> 	while (n > 0 && p) {
> 		n -= IDR_BITS;
> 		p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
> 	}
> 	return((void *)p);
> }
> EXPORT_SYMBOL(idr_find);
> 
> /**
>  * idr_for_each - iterate through all stored pointers
>  * @idp: idr handle
>  * @fn: function to be called for each pointer
>  * @data: data passed back to callback function
>  *
>  * Iterate over the pointers registered with the given idr.  The
>  * callback function will be called for each pointer currently
>  * registered, passing the id, the pointer and the data pointer passed
>  * to this function.  It is not safe to modify the idr tree while in
>  * the callback, so functions such as idr_get_new and idr_remove are
>  * not allowed.
>  *
>  * We check the return of @fn each time. If it returns anything other
>  * than 0, we break out and return that value.
>  *
>  * The caller must serialize idr_for_each() vs idr_get_new() and idr_remove().
>  */
> int idr_for_each(struct idr *idp,
> 		 int (*fn)(int id, void *p, void *data), void *data)
> {
> 	int n, id, max, error = 0;
> 	struct idr_layer *p;
> 	struct idr_layer *pa[MAX_LEVEL];
> 	struct idr_layer **paa = &pa[0];
> 
> 	n = idp->layers * IDR_BITS;
> 	p = rcu_dereference(idp->top);
> 	max = 1 << n;
> 
> 	id = 0;
> 	while (id < max) {
> 		while (n > 0 && p) {
> 			n -= IDR_BITS;
> 			*paa++ = p;
> 			p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
> 		}
> 
> 		if (p) {
> 			error = fn(id, (void *)p, data);
> 			if (error)
> 				break;
> 		}
> 
> 		id += 1 << n;
> 		while (n < fls(id)) {
> 			n += IDR_BITS;
> 			p = *--paa;
> 		}
> 	}
> 
> 	return error;
> }
> EXPORT_SYMBOL(idr_for_each);
> 
> /**
>  * idr_replace - replace pointer for given id
>  * @idp: idr handle
>  * @ptr: pointer you want associated with the id
>  * @id: lookup key
>  *
>  * Replace the pointer registered with an id and return the old value.
>  * A -ENOENT return indicates that @id was not found.
>  * A -EINVAL return indicates that @id was not within valid constraints.
>  *
>  * The caller must serialize with writers.
>  */
> void *idr_replace(struct idr *idp, void *ptr, int id)
> {
> 	int n;
> 	struct idr_layer *p, *old_p;
> 
> 	n = idp->layers * IDR_BITS;
> 	p = idp->top;

OK, the caller presumably holds an update-side lock.

> 
> 	id &= MAX_ID_MASK;
> 
> 	if (id >= (1 << n))
> 		return ERR_PTR(-EINVAL);
> 
> 	n -= IDR_BITS;
> 	while ((n > 0) && p) {
> 		p = p->ary[(id >> n) & IDR_MASK];

OK, the caller presumably holds the update-side lock.

> 		n -= IDR_BITS;
> 	}
> 
> 	n = id & IDR_MASK;
> 	if (unlikely(p == NULL || !test_bit(n, &p->bitmap)))
> 		return ERR_PTR(-ENOENT);
> 
> 	old_p = p->ary[n];

OK, the caller presumably holds the update-side lock.

> 	rcu_assign_pointer(p->ary[n], ptr);
> 
> 	return old_p;
> }
> EXPORT_SYMBOL(idr_replace);
> 
> static void idr_cache_ctor(struct kmem_cache *idr_layer_cache, void *idr_layer)
> {
> 	memset(idr_layer, 0, sizeof(struct idr_layer));
> }
> 
> void __init idr_init_cache(void)
> {
> 	idr_layer_cache = kmem_cache_create("idr_layer_cache",
> 				sizeof(struct idr_layer), 0, SLAB_PANIC,
> 				idr_cache_ctor);
> }
> 
> /**
>  * idr_init - initialize idr handle
>  * @idp:	idr handle
>  *
>  * This function is use to set up the handle (@idp) that you will pass
>  * to the rest of the functions.
>  */
> void idr_init(struct idr *idp)
> {
> 	memset(idp, 0, sizeof(struct idr));
> 	spin_lock_init(&idp->lock);
> }
> EXPORT_SYMBOL(idr_init);
> 
> 
> /*
>  * IDA - IDR based ID allocator
>  *
>  * this is id allocator without id -> pointer translation.  Memory
>  * usage is much lower than full blown idr because each id only
>  * occupies a bit.  ida uses a custom leaf node which contains
>  * IDA_BITMAP_BITS slots.
>  *
>  * 2007-04-25  written by Tejun Heo <htejun@...il.com>
>  */
> 
> static void free_bitmap(struct ida *ida, struct ida_bitmap *bitmap)
> {
> 	unsigned long flags;
> 
> 	if (!ida->free_bitmap) {
> 		spin_lock_irqsave(&ida->idr.lock, flags);
> 		if (!ida->free_bitmap) {
> 			ida->free_bitmap = bitmap;
> 			bitmap = NULL;
> 		}
> 		spin_unlock_irqrestore(&ida->idr.lock, flags);
> 	}
> 
> 	kfree(bitmap);
> }
> 
> /**
>  * ida_pre_get - reserve resources for ida allocation
>  * @ida:	ida handle
>  * @gfp_mask:	memory allocation flag
>  *
>  * This function should be called prior to locking and calling the
>  * following function.  It preallocates enough memory to satisfy the
>  * worst possible allocation.
>  *
>  * If the system is REALLY out of memory this function returns 0,
>  * otherwise 1.
>  */
> int ida_pre_get(struct ida *ida, gfp_t gfp_mask)
> {
> 	/* allocate idr_layers */
> 	if (!idr_pre_get(&ida->idr, gfp_mask))
> 		return 0;
> 
> 	/* allocate free_bitmap */
> 	if (!ida->free_bitmap) {
> 		struct ida_bitmap *bitmap;
> 
> 		bitmap = kmalloc(sizeof(struct ida_bitmap), gfp_mask);
> 		if (!bitmap)
> 			return 0;
> 
> 		free_bitmap(ida, bitmap);
> 	}
> 
> 	return 1;
> }
> EXPORT_SYMBOL(ida_pre_get);
> 
> /**
>  * ida_get_new_above - allocate new ID above or equal to a start id
>  * @ida:	ida handle
>  * @staring_id:	id to start search at
>  * @p_id:	pointer to the allocated handle
>  *
>  * Allocate new ID above or equal to @ida.  It should be called with
>  * any required locks.
>  *
>  * If memory is required, it will return -EAGAIN, you should unlock
>  * and go back to the ida_pre_get() call.  If the ida is full, it will
>  * return -ENOSPC.
>  *
>  * @p_id returns a value in the range 0 ... 0x7fffffff.
>  */
> int ida_get_new_above(struct ida *ida, int starting_id, int *p_id)
> {
> 	struct idr_layer *pa[MAX_LEVEL];
> 	struct ida_bitmap *bitmap;
> 	unsigned long flags;
> 	int idr_id = starting_id / IDA_BITMAP_BITS;
> 	int offset = starting_id % IDA_BITMAP_BITS;
> 	int t, id;
> 
>  restart:
> 	/* get vacant slot */
> 	t = idr_get_empty_slot(&ida->idr, idr_id, pa);
> 	if (t < 0)
> 		return _idr_rc_to_errno(t);
> 
> 	if (t * IDA_BITMAP_BITS >= MAX_ID_BIT)
> 		return -ENOSPC;
> 
> 	if (t != idr_id)
> 		offset = 0;
> 	idr_id = t;
> 
> 	/* if bitmap isn't there, create a new one */
> 	bitmap = (void *)pa[0]->ary[idr_id & IDR_MASK];

OK, the caller presumably holds the update-side lock.

> 	if (!bitmap) {
> 		spin_lock_irqsave(&ida->idr.lock, flags);
> 		bitmap = ida->free_bitmap;
> 		ida->free_bitmap = NULL;
> 		spin_unlock_irqrestore(&ida->idr.lock, flags);
> 
> 		if (!bitmap)
> 			return -EAGAIN;
> 
> 		memset(bitmap, 0, sizeof(struct ida_bitmap));
> 		rcu_assign_pointer(pa[0]->ary[idr_id & IDR_MASK],
> 				(void *)bitmap);
> 		pa[0]->count++;
> 	}
> 
> 	/* lookup for empty slot */
> 	t = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, offset);
> 	if (t == IDA_BITMAP_BITS) {
> 		/* no empty slot after offset, continue to the next chunk */
> 		idr_id++;
> 		offset = 0;
> 		goto restart;
> 	}
> 
> 	id = idr_id * IDA_BITMAP_BITS + t;
> 	if (id >= MAX_ID_BIT)
> 		return -ENOSPC;
> 
> 	__set_bit(t, bitmap->bitmap);
> 	if (++bitmap->nr_busy == IDA_BITMAP_BITS)
> 		idr_mark_full(pa, idr_id);
> 
> 	*p_id = id;
> 
> 	/* Each leaf node can handle nearly a thousand slots and the
> 	 * whole idea of ida is to have small memory foot print.
> 	 * Throw away extra resources one by one after each successful
> 	 * allocation.
> 	 */
> 	if (ida->idr.id_free_cnt || ida->free_bitmap) {
> 		struct idr_layer *p = get_from_free_list(&ida->idr);
> 		if (p)
> 			kmem_cache_free(idr_layer_cache, p);
> 	}
> 
> 	return 0;
> }
> EXPORT_SYMBOL(ida_get_new_above);
> 
> /**
>  * ida_get_new - allocate new ID
>  * @ida:	idr handle
>  * @p_id:	pointer to the allocated handle
>  *
>  * Allocate new ID.  It should be called with any required locks.
>  *
>  * If memory is required, it will return -EAGAIN, you should unlock
>  * and go back to the idr_pre_get() call.  If the idr is full, it will
>  * return -ENOSPC.
>  *
>  * @id returns a value in the range 0 ... 0x7fffffff.
>  */
> int ida_get_new(struct ida *ida, int *p_id)
> {
> 	return ida_get_new_above(ida, 0, p_id);
> }
> EXPORT_SYMBOL(ida_get_new);
> 
> /**
>  * ida_remove - remove the given ID
>  * @ida:	ida handle
>  * @id:		ID to free
>  */
> void ida_remove(struct ida *ida, int id)
> {
> 	struct idr_layer *p = ida->idr.top;
> 	int shift = (ida->idr.layers - 1) * IDR_BITS;
> 	int idr_id = id / IDA_BITMAP_BITS;
> 	int offset = id % IDA_BITMAP_BITS;
> 	int n;
> 	struct ida_bitmap *bitmap;
> 
> 	/* clear full bits while looking up the leaf idr_layer */
> 	while ((shift > 0) && p) {
> 		n = (idr_id >> shift) & IDR_MASK;
> 		__clear_bit(n, &p->bitmap);
> 		p = p->ary[n];

OK, the caller presumably holds the update-side lock.

> 		shift -= IDR_BITS;
> 	}
> 
> 	if (p == NULL)
> 		goto err;
> 
> 	n = idr_id & IDR_MASK;
> 	__clear_bit(n, &p->bitmap);
> 
> 	bitmap = (void *)p->ary[n];

OK, the caller presumably holds the update-side lock.

> 	if (!test_bit(offset, bitmap->bitmap))
> 		goto err;
> 
> 	/* update bitmap and remove it if empty */
> 	__clear_bit(offset, bitmap->bitmap);
> 	if (--bitmap->nr_busy == 0) {
> 		__set_bit(n, &p->bitmap);	/* to please idr_remove() */
> 		idr_remove(&ida->idr, idr_id);
> 		free_bitmap(ida, bitmap);
> 	}
> 
> 	return;
> 
>  err:
> 	printk(KERN_WARNING
> 	       "ida_remove called for id=%d which is not allocated.\n", id);
> }
> EXPORT_SYMBOL(ida_remove);
> 
> /**
>  * ida_destroy - release all cached layers within an ida tree
>  * ida:		ida handle
>  */
> void ida_destroy(struct ida *ida)
> {
> 	idr_destroy(&ida->idr);
> 	kfree(ida->free_bitmap);
> }
> EXPORT_SYMBOL(ida_destroy);
> 
> /**
>  * ida_init - initialize ida handle
>  * @ida:	ida handle
>  *
>  * This function is use to set up the handle (@ida) that you will pass
>  * to the rest of the functions.
>  */
> void ida_init(struct ida *ida)
> {
> 	memset(ida, 0, sizeof(struct ida));
> 	idr_init(&ida->idr);
> 
> }
> EXPORT_SYMBOL(ida_init);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/