[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170711084055.pfrzl5kql7coxsxn@gmail.com>
Date: Tue, 11 Jul 2017 10:40:56 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
live-patching@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>, Jiri Slaby <jslaby@...e.cz>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2 4/8] objtool: add undwarf debuginfo generation
* Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> Anyway, I used some linker magic to temporarily move the unwinder code to the
> end of .text, so that unwinder changes don't add unexpected side effects to the
> microbenchmark behavior. Now I'm getting more consistent results: the packed
> struct is measuring ~2% slower. The slight slowdown might just be explained by
> the fact that GCC generates some extra instructions for extracting the fields
> out of the packed struct.
Yeah, the 16-bit field accesses versus a zero-extended 32-bit field are more
complex to access even on x86 that has a fair amount of 16-bit legacy.
> In the meantime, I found a ~10% speedup by making the "fast lookup table" block
> size a power-of-two (256) to get rid of the need for a slow 'div' instruction.
>
> I think I'm done performance tweaking for now. I'll keep the packed struct, and
> add the code for the 'div' removal, and hope to submit v3 soon.
Sounds good to me!
~2% slowdown for ~30% RAM savings for a debug data structure that is about as
large as a typical kernel's total .text is a decent trade-off.
Thanks,
Ingo
Powered by blists - more mailing lists