lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 13 Oct 2017 15:40:43 +0000
From:   Matthew Wilcox <mawilcox@...rosoft.com>
To:     Wei Yang <richard.weiyang@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] radix-tree: get_slot_offset() returns invalid offset when
 parent is NULL

From: Wei Yang [mailto:richard.weiyang@...il.com]
> On Wed, Oct 11, 2017 at 04:39:40PM -0700, Andrew Morton wrote:
> >On Wed, 11 Oct 2017 10:33:39 +0800 Wei Yang
> >> BTW, I have another question on the implementation of radix tree.
> >>
> >> #Question: why 6 is one of the choice of RADIX_TREE_MAP_SHIFT
> >>
> >> Currently, the shift of the tree is defined as below.
> >>
> >>     #define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)
> >>
> >> The index in the tree is with type unsigned long, which means the size is
> >> 32bits or 64bits.
> >>
> >>     32 % 4 = 0, 64 % 4 = 0
> >>
> >>     32 % 6 = 2, 64 % 6 = 4
> >>
> >> This means when the tree shift is 6 and represents the max index, there
> would
> >> be some unused slots at the top level.
> >>
> >> So why we chose 6 instead of a 2^N?
> >>
> >> The original commit is 139e561660fe11e0fc35e142a800df3dd7d03e9d,
> which I don't
> >> see some reason for this choice.
> >>
> >> If possible, I suggest to change this value to a power of 2. Or maybe I missed
> >> some critical reason behind this.
> >>
> >
> >I'm not sure I really see the problem.  1<<6 is still a power of 2.
> >radix_tree_split_preload() is doing mod and div on shift distances
> >which hurts my brain a bit.  But CONFIG_BASE_SMALL=0 is what most
> >people will be using so presumably the current code tests out OK.
> >Although simple memory wastage would be hard to detect.  Matthew has
> >been here recently and perhaps can consider?
> 
> Hmm, I may not state it clearly.
> 
> What I mean is to set RADIX_TREE_MAP_SHIFT a power of 2, instead of the
> result
> of (1 << RADIX_TREE_MAP_SHIFT).
> 
> Let me explain a simplified example. Assume the max index is 16 bits.
> 
> Case 1, shift is 4
> 
>        Each 4 bit forms a group and exactly 4 groups.
> 
>        |0123|4567|0123|4567|
> 
> Case 2, shift is 6
> 
>        Each 6 bit forms a group and left 3 groups with 2 empty slots at the
>        beginning.
> 
>        |  0123|456701|234567|
> 
> Radix tree is a little bit like the page table, which is divided into several
> levels and each level is referenced by a "slot" of bits.
> 
>     When the shift is a power of 2, the digit fits in exact "slots". This means
>     there is no waste.
> 
>     When the shift is not, we will have several unused digits at the most
>     significant part. This means several slot will never be used.
> 
>     This means on theory, in a tree with shift 6
> 
>        |  0123|456701|234567|
> 
>     We can store an index with value
> 
>         ffffff ffffff ffffff
> 
>     While actually the largest value a 16 bits index could represents
> 
>         __ffff ffffff ffffff
> 
> Since radix tree will not allocate memory when there is no corresponding
> index, this will just leave several unused slot at the top level.

Yes, if we have a tree with 2^64 pointers in it, we'll have 48 unused pointers in the top level leaf node.  That's not a huge percentage of waste.

> While I still prefer to fully utilize the whole tree instead of leave some
> never touch slot. And this can be achieved by set RADIX_TREE_MAP_SHIFT to a
> power of 2.
> 
> So if there is no specific reason to use 6, I would like to form a patch to
> change it to 8.

The cost of that is in the slab allocator.  With 64bit pointers and a 64-entry node, we use 576 bytes per node, fitting 7 nodes per page.  With 256 entries per node, the node balloons to just over 2kB, and we get 3 nodes per pair of pages.  This is massively less efficient than the inefficiency you’re complaining about.

Powered by blists - more mailing lists