linux-kernel - RE: [RFC 00/10] implement alternative and much simpler id allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <BY2PR21MB003659BCA2EC02603245CD2ACB950@BY2PR21MB0036.namprd21.prod.outlook.com>
Date:   Fri, 23 Dec 2016 17:03:51 +0000
From:   Matthew Wilcox <mawilcox@...rosoft.com>
To:     Rasmus Villemoes <linux@...musvillemoes.dk>
CC:     Tejun Heo <tj@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Jens Axboe" <axboe@...nel.dk>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "Andrew Morton" <akpm@...ux-foundation.org>
Subject: RE: [RFC 00/10] implement alternative and much simpler id allocator

From: Rasmus Villemoes [mailto:linux@...musvillemoes.dk]
> Nice work! A few random comments/questions:
> 
> - It does add some complexity, but I think a few comments would make it
>   more digestable.

I'm open to adding some comments ... I need some time between writing the code and writing the comments to be sure what comments are useful.

> - Hm, maybe I'm confused, and I certainly don't understand how the whole
>   radix tree works. But do you use every leaf node as an exceptional
>   entry initially, to allocate the first 62 ids from that level? This
>   code

I do!  And that question tells me one useful comment to add!

> 	if ((bit + RADIX_TREE_EXCEPTIONAL_SHIFT) <
> 					BITS_PER_LONG) {
> 		bit += RADIX_TREE_EXCEPTIONAL_SHIFT;
> 		radix_tree_iter_replace(root, &iter, slot,
> 				(void *)((1UL << bit) |
> 				RADIX_TREE_EXCEPTIONAL_ENTRY));
> 		*id = new;
> 		return 0;
> 	}
> 
>    operates on bit, which is the offset from index*IDA_BITMAP_BITS, and
>    it seems to create an exceptional entry somewhere down the tree
>    (which may of course be the root).
> 
>    But we don't seem to allocate another bit from that exceptional entry
>    ever unless it happened to sit at index 0; the code
> 
> 	unsigned long tmp = (unsigned long)bitmap;
> 	if (start < BITS_PER_LONG) {
> 		unsigned tbit = find_next_zero_bit(&tmp,
> 					BITS_PER_LONG, start);
> 		if (tbit < BITS_PER_LONG) {
> 			tmp |= 1UL << tbit;
> 			rcu_assign_pointer(*slot, (void *)tmp);
> 			*id = new + tbit -
> 				RADIX_TREE_EXCEPTIONAL_SHIFT;
> 			return 0;
> 		}
> 	}
> 
>    is only used for small values of start (though it does seem to
>    account for a non-zero value of new == iter.index * IDA_BITMAP_BITS).

Ahh.  You're reading the code wrong, and I wrote the code wrongly too, so it's clearly overly complex.  I was testing with 'start' set to 0, allocating N entries, and then freeing them.  If I'd set start to 1024 and allocated two entries, I'd've seen the failure.

Please see the top commit here ("Improve IDA exceptional entry handling"): http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/idr-2016-12-20


> - In the micro-optimization department, I think we should avoid
>   find_next_zero_bit on a single word, and instead use __ffs. Something
>   like (assuming we can use bit instead of start)
> 
>   if (bit + RADIX_TREE_EXCEPTIONAL_SHIFT < BITS_PER_LONG) {
>     tmp = (~(unsigned long)bitmap) >> (bit + RADIX_TREE_EXCEPTIONAL_SHIFT);
>     if (tmp) {
>       tbit = __ffs(tmp) + bit + RADIX_TREE_EXCEPTIONAL_SHIFT;
>       tmp = (unsigned long)bitmap | (1UL << tbit);
>       rcu_assign_pointer(*slot, (void *)tmp);
>       *id = new + tbit - RADIX_TREE_EXCEPTIONAL_SHIFT;
>       return 0;
>     }
>   }

I'm certainly open to microoptimisations, but I'll have to think about this one for a bit.