linux-kernel - Re: Generic Red-Black Trees (status update)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4FC02E07.8070703@att.net>
Date:	Fri, 25 May 2012 20:12:39 -0500
From:	Daniel Santos <danielfsantos@....net>
To:	Andi Kleen <andi@...stfloor.org>
CC:	linux-kernel@...r.kernel.org
Subject: Re: Generic Red-Black Trees (status update)


> Daniel Santos <danielfsantos@....net> writes:
>
>> For anybody that's keeping up with this, I've gone through multiple
>> iterations and tests with 9 different gcc versions and concluded that
>> the search, insert & remove cores need to be coded in rbtree.h, using
>> the traditional interface (i.e., passing struct rb_node & rb_root
>> pointers instead of pointers to your specific object types).  The reason
>> is that gcc can't handle the cool fully-generic code until 4.6.  In gcc
>> 4.5.x, optimization completely breaks expanding the inline functions
> Can you post details?
Well, I suppose part of this is my own value judgment of what is a
"clean" implementation.  By this, I mean balancing these requirements:
1.) minimal dependence on pre-processor
2.) avoiding pre-processor expanded code that will break debug
information (backtraces)
3.) optimal encapsulation of the details of your rbtree in minimal
source code (this is where you define the relationship between your
container and contained objects, their types, keys, rather or not
non-unique objects are allowed, etc.) -- preferably eliminating
duplication of these details entirely.
4.) offering a complete feature-set in a single implementation (not
multiple functions when various features are used)
5.) perfect optimization -- the generic function must be exactly as
efficient as the hand-coded version

So by those standards, the cleanest implementation I've come up with
uses a macro to define an anonymous interface struct something like this:

/* gerneric non-type-safe function */
static __always_inline void *__generic_func(void *obj);

/* macro to generate type-safe interface object (in practice, the real one
 * defines all the functions in the interface, but I'm keeping it simple for
 * brevity)
 */
#define INTERFACE_A(name, in_type, out_type)                            \
struct {                                                                \
        out_type *(*const func)(in_type *obj);                          \
} name = {                                                              \
        .func = (out_type *(*const)(in_type *obj))__generic_func;       \
}

/* usage looks like this: */
INTERFACE_A(solution_a, struct something, struct something_else);
struct something *s;
struct something_else *se;
se = solution_a.func(s);

Calling solution_a.func(s) optimizes perfectly in 4.6, while in 4.5 and
prior, the call by struct-member-function-pointer is never inlined and
nothing passed to it is every considered a compile-time constant. 
Because of the implementation of the generic functions, it bloats the
code unacceptably (3x larger).  The following alternative works prior to
4.6, but with different syntax:

/* IMO, this solution is uglier and will break backtraces. */
#define INTERFACE_B(name, in_type, out_type)                    \
static __always_inline out_type * name##_func(in_type *obj)     \
{                                                               \
        return (out_type *)__generic_func(obj);                 \
}

/* now you call solution_b_func(s) instead of solution_a.func(s) */

>> into huge bloated  monsters.  Also, while I'm re-coding it all, I'm
>> adding find_near & insert_near, for more efficient insertion & retrieval
>> when you already have a node that should be close to the one you want
>> (which is often the case when inserting many objects at once).
>>
>> So after I'm done with this, I'll start on a new header file (grbtree.h
>> probably) using the "grb_" prefix for it's functions that implements the
>> gcc 4.6.x+ fully generic & type safe interface, but using cute
>> pre-processor tricks for pre-4.6.x compatibility (basically, something
>> to consider using once gcc 4.6+ is more widely used).
> That doesn't make sense. Either it's used or it's not used,
> but if it's available it should work with all compilers.
>
> Otherwise you would end up with drivers or subsystems that
> are compiler specific.
>
> It's ok to be somewhat slower or bigger on older compilers.
You have a good point here, although I'm not sure that a 3x larger
function is an acceptable performance hit for a compiler as recent as
4.5.  Perhaps it's best to just implement it using the INTERFACE_B style
above, accept the minor loss of backtrace-ability and pre-processor
ugliness and get on with it.  There's no advantage to having two
competing syntaxes for usage.  I'll post the full details with patch
tomorrow.

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/