linux-kernel - Generic rbtree: compare() vs less()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <4FCBF042.2090208@att.net>
Date:	Sun, 03 Jun 2012 18:16:18 -0500
From:	Daniel Santos <danielfsantos@....net>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	linux-kernel@...r.kernel.org
Subject: Generic rbtree: compare() vs less()

Peter,

I wanted to get with you on something you mentioned in IRC, as well as
one of your emails.  The (primary) reason my code uses a compare()
function instead of less() is that it has to work with trees that allow
unique keys.  In the fair scheduler, it doesn't matter because:
a.) duplicate keys are allowed
b.) you never need to do lookups.

If you had to do lookups, you would have to call less() twice:

if (less(p->key, key)) {
    p = p->right;
} else if (less(key, p->key)) {
    p = p->right;
} else {
    return p;
}

Using compare() means that you only call this function once and allows
for the possibility that some trees might have a more complication
compare function.  Plus, on gcc 4.5 and earlier, it's not inlining the
compare function call. :(

The reason I'm returning long from compare() instead of returning int,
is to hopefully simplify the generated code for compare functions that
need to subtract two 64-bit values and still not incur additional
overhead.  If I used int, then the compare function for a 64-bit value
would have to look like this:

int compare(u64 *a, u64 *b) {
    return *a > *b ? 1 : (*a < *b ? -1 : 0);
}
// or
int compare(u64 *a, u64 *b) {
    s64 diff = *a - *b;
    return diff > 0 ? 1: (diff < 0 ? -1 : 0);
}

And that's a whole lot more instructions, unless the compiler can inline
the function, compared the two values, use the CPU's negative & zero
flags and just completely compiled out int return value which, sadly,
just doesn't seem to happen very often. :(  (especially on 4.5 and
earlier where it fails to inline the compare function call).

Incidentally, this is what I've done on my fair.c patch:

+static inline long compare_vruntime(u64 *a, u64 *b)
+{
+#if __BITS_PER_LONG >= 64
+       return (long)((s64)*a - (s64)*b);
+#else
+/* This is hacky, but is done to reduce instructions -- we wont use
this for
+ * rbtree lookups, only inserts, and since our relationship is defined as
+ * non-unique, we only need to return positive if a > b and any other value
+ * means less than.
+ */
+       return (long)(*a > *b);
+#endif
+}

This should keep the instruction count pretty much what it was prior to
my patch on 32-bit systems as well as 64.

Anyway, please let me know if you have any other ideas on this!  I'm
going to write up a Q&A or something that explains the reasons for some
of these seemingly odd decisions. (maybe I'll even find some better
alternatives after getting feedback)

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/