lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 1 Jun 2017 06:50:08 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, X86 ML <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        live-patching@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>, Jiri Slaby <jslaby@...e.cz>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [RFC PATCH 00/10] x86: undwarf unwinder

On Thu, Jun 1, 2017 at 5:47 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> On Thu, Jun 01, 2017 at 02:17:21PM +0200, Peter Zijlstra wrote:
>> On Thu, Jun 01, 2017 at 06:58:20AM -0500, Josh Poimboeuf wrote:
>> > > Being able to generate more optimal code in the hottest code paths of the kernel
>> > > is the _real_, primary upstream kernel benefit of a different debuginfo method -
>> > > which has to be weighed against the pain of introducing a new unwinder. But this
>> > > submission does not talk about that aspect at all, which should be fixed I think.
>> >
>> > Actually I devoted an entire one-sentence paragraph to performance in
>> > the documentation:
>> >
>> >   The simpler debuginfo format also enables the unwinder to be relatively
>> >   fast, which is important for perf and lockdep.
>> >
>> > But I'll try to highlight that a little more.
>>
>> That's relative to a DWARF unwinder.
>
> Yes.
>
>> It doesn't appear to be possible to get anywhere near a frame-pointer
>> unwinder due to having to do this log(n) lookup for every single
>> frame.
>
> Hm, is there something faster, yet not substantially bigger?  Hash?
> Trie?

You have, roughly, a set of (key_start, value) pairs where, for any
given key, you want to find the (key_start, value) with the largest
key_start that doesn't exceed key.  Binary search gives you log_2(n)
queries, but its locality of reference sucks.  Here are two
suggestions for improving it:

1. Change the data layout.  Instead of having an array of undwarf
entries, have two parallel arrays, one with the ip addresses and one
with everything else.  This has no effect on the amount of space used,
but it makes the part used during search more compact.

2. Your key space is fairly small and your table entries should be
reasonably uniformly distributed.  Let the first IP you have unwind
data for be IP0.  Make an array mapping (IP - IP0) / B to the index of
the unwind entry for that IP for some suitable block size B.  Then, to
look up an IP, you'd find the indices of the unwind entries for (IP -
IP0) / B and (IP - IP0) / B + 1 and binary search between them.  With
constant B, this gives you O(1) performance instead of O(log(n)).
With B = 1, it's very fast, but the table is huge.  With B = 64k or
so, maybe you'd get a nice tradeoff of speedup vs size.  (With
modules, you'd presumably first search an rbtree to find which
instance of this data structure you're using and then do the lookup.)

3. Use a B-tree.  B-trees are simple if you don't need to deal with
insertion and deletion.  Presumably you'd choose your internal node
size so each internal node is exactly 64 or 128 bytes for good cache
performance.  This is still O(log(n)) and it uses more comparisons
than binary search, but you touch many fewer cache lines.

I expect that, if you do #1 and #2, you'd get excellent performance at
very little cost to the complexity of the code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ