lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzpNtLSQvcOOut--2UkkMyqSb9e0_VO3JAxvZRTv7YT1g@mail.gmail.com>
Date:	Fri, 22 Jun 2012 11:52:43 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Hagen Paul Pfeifer <hagen@...u.net>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [GIT PULL] perf fixes

On Fri, Jun 22, 2012 at 11:38 AM, Hagen Paul Pfeifer <hagen@...u.net> wrote:
>>
>>Because that mcount thing is expensive as hell, if people haven't
>>noticed (and I'm not talking about just the call instruction that I
>>think we can stub out - it changes code generation in other ways too).
>>And it looks like distros enable it by default, which annoys my
>>performance-optimizing soul deeply.
>
> Isn't it stubed out already? Already replaced by nops at boot time by
> ftrace_code_disable() and friends!? But yes, there may be spots where the
> additional mcount() call avoid optimization.

So even stubbed out, it's quite noticeable. The call causes the
function prologue to change quite a bit.

That's actually especially true with newer versions of gcc that
*finally* seem to have done the "don't always generate the full
prologue if some case doesn't need it" optimization. So functions that
have early-out conditions (quite common) will exit before even having
done the prologue, and without doing the whole frame pointer setup
etc.

Except if mcount generation is on. Then gcc will always do the
prologue and frame pointer setup before doing the mcount, because
mcount wants it.

So it really isn't just the extra call instruction.

I may be more sensitive to this than most, because I look at profiles
and the function prologue just looks very ugly with the call mcount
thing. Ugh.

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ