linux-kernel - Re: [PATCH v2 4/8] objtool: add undwarf debuginfo generation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20170711084055.pfrzl5kql7coxsxn@gmail.com>
Date:   Tue, 11 Jul 2017 10:40:56 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org,
        live-patching@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>, Jiri Slaby <jslaby@...e.cz>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2 4/8] objtool: add undwarf debuginfo generation


* Josh Poimboeuf <jpoimboe@...hat.com> wrote:

> Anyway, I used some linker magic to temporarily move the unwinder code to the 
> end of .text, so that unwinder changes don't add unexpected side effects to the 
> microbenchmark behavior.  Now I'm getting more consistent results: the packed 
> struct is measuring ~2% slower.  The slight slowdown might just be explained by 
> the fact that GCC generates some extra instructions for extracting the fields 
> out of the packed struct.

Yeah, the 16-bit field accesses versus a zero-extended 32-bit field are more 
complex to access even on x86 that has a fair amount of 16-bit legacy.

> In the meantime, I found a ~10% speedup by making the "fast lookup table" block 
> size a power-of-two (256) to get rid of the need for a slow 'div' instruction.
> 
> I think I'm done performance tweaking for now.  I'll keep the packed struct, and 
> add the code for the 'div' removal, and hope to submit v3 soon.

Sounds good to me!

~2% slowdown for ~30% RAM savings for a debug data structure that is about as 
large as a typical kernel's total .text is a decent trade-off.

Thanks,

	Ingo