linux-kernel - Re: [RFC 00/10] implement alternative and much simpler id allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87pokjjznr.fsf@rasmusvillemoes.dk>
Date:   Fri, 23 Dec 2016 00:46:00 +0100
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Matthew Wilcox <mawilcox@...rosoft.com>
Cc:     Tejun Heo <tj@...nel.org>,
        "linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Jens Axboe" <axboe@...nel.dk>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "linux-block\@vger.kernel.org" <linux-block@...r.kernel.org>,
        "dri-devel\@lists.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "Andrew Morton" <akpm@...ux-foundation.org>
Subject: Re: [RFC 00/10] implement alternative and much simpler id allocator

On Sat, Dec 17 2016, Matthew Wilcox <mawilcox@...rosoft.com> wrote:

> From: Matthew Wilcox
>> From: Rasmus Villemoes [mailto:linux@...musvillemoes.dk]
>> > This sounds good. I think there may still be a lot of users that never
>> > allocate more than a handful of IDAs, making a 128 byte allocation still
>> > somewhat excessive. One thing I considered was (exactly as it's done for
>> > file descriptor tables) to embed a single word in the struct ida and
>> > use that initially; I haven't looked closely at newIDA, so I don't know
>> > how easy that would be or if its worth the complexity.
>> 
>> Heh, I was thinking about that too.  The radix tree supports "exceptional
>> entries" which have the bottom bit set.  On a 64-bit machine, we could use 62
>> of the bits in the radix tree root to store the ID bitmap.  I'm a little wary of the
>> potential complexity, but we should try it out.
>
> Test patch here: http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/idr-2016-12-16
> It passes the test suite ... which I actually had to adjust because it
> now succeeds in cases where it hadn't (allocating ID 0 without
> preallocating), and it will now fail in cases where it hadn't
> previously (assuming a single preallocation would be enough).  There
> shouldn't be any examples of that in the kernel proper; it was simply
> me being lazy when I wrote the test suite.

Nice work! A few random comments/questions:

- It does add some complexity, but I think a few comments would make it
  more digestable.

- Hm, maybe I'm confused, and I certainly don't understand how the whole
  radix tree works. But do you use every leaf node as an exceptional
  entry initially, to allocate the first 62 ids from that level? This
  code

	if ((bit + RADIX_TREE_EXCEPTIONAL_SHIFT) <
					BITS_PER_LONG) {
		bit += RADIX_TREE_EXCEPTIONAL_SHIFT;
		radix_tree_iter_replace(root, &iter, slot,
				(void *)((1UL << bit) |
				RADIX_TREE_EXCEPTIONAL_ENTRY));
		*id = new;
		return 0;
	}

   operates on bit, which is the offset from index*IDA_BITMAP_BITS, and
   it seems to create an exceptional entry somewhere down the tree
   (which may of course be the root).

   But we don't seem to allocate another bit from that exceptional entry
   ever unless it happened to sit at index 0; the code

	unsigned long tmp = (unsigned long)bitmap;
	if (start < BITS_PER_LONG) {
		unsigned tbit = find_next_zero_bit(&tmp,
					BITS_PER_LONG, start);
		if (tbit < BITS_PER_LONG) {
			tmp |= 1UL << tbit;
			rcu_assign_pointer(*slot, (void *)tmp);
			*id = new + tbit -
				RADIX_TREE_EXCEPTIONAL_SHIFT;
			return 0;
		}
	}

   is only used for small values of start (though it does seem to
   account for a non-zero value of new == iter.index * IDA_BITMAP_BITS).
   
- In the micro-optimization department, I think we should avoid
  find_next_zero_bit on a single word, and instead use __ffs. Something
  like (assuming we can use bit instead of start)

  if (bit + RADIX_TREE_EXCEPTIONAL_SHIFT < BITS_PER_LONG) {
    tmp = (~(unsigned long)bitmap) >> (bit + RADIX_TREE_EXCEPTIONAL_SHIFT);
    if (tmp) {
      tbit = __ffs(tmp) + bit + RADIX_TREE_EXCEPTIONAL_SHIFT;
      tmp = (unsigned long)bitmap | (1UL << tbit);
      rcu_assign_pointer(*slot, (void *)tmp);
      *id = new + tbit - RADIX_TREE_EXCEPTIONAL_SHIFT;
      return 0;
    }
  }

Rasmus