linux-kernel - Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTikz+vJGFuysDXAdVb33q1q3L547dXNJa9NmeqeM@mail.gmail.com>
Date:	Tue, 22 Mar 2011 09:59:59 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Pekka Enberg <penberg@...nel.org>, Jesper Juhl <jj@...osbits.net>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Daniel Lezcano <daniel.lezcano@...e.fr>,
	Eric Paris <eparis@...hat.com>,
	Roman Zippel <zippel@...ux-m68k.org>,
	linux-kbuild@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>
Subject: Re: PATCH][RFC][resend] CC_OPTIMIZE_FOR_SIZE should default to N

On Tue, Mar 22, 2011 at 3:27 AM, Ingo Molnar <mingo@...e.hu> wrote:
>
> If that situation has changed - if GCC has regressed in this area then a commit
> changing the default IMHO gains a lot of credibility if it is backed by careful
> measurements using perf stat --repeat or similar tools.

Also, please don't back up any numbers for the "-O2 is faster than
-Os" case with some benchmark that is hot in the caches.

The thing is, many optimizations that make the code larger look really
good if there are no cache misses, and the code is run a million times
in a tight loop.

But kernel code in particular tends to not be like that. Yes, there
are cases where we spend 75% of the time in the kernel (my own
personal favorite is "git diff") basically having user space loop
around just one single operation. But it is _really_ quite rare in
real life. Most of the time, user space will blow the kernel caches
out of the water, and the kernel loops will be on the order of a few
entries (eg a "loop" may be the loop around a pathname lookup, and
loops over three path components). Not millions.

The rule-of-thumb should be simple: 10% larger code likely means 10%
more I$ misses. Does the larger -O2 code make up for it?

Now, the downside of -Os has always been that it's not all that widely
used, so we've hit compiler bugs several times. That's been almost
enough to make me think that it's not worth it. But currently I don't
think we have any known issues, and probably exactly _because_ we use
-Os it seems that gcc hasn't that many regressions. It was much more
painful when we started trying to use -Os.

(That said, gcc -Os isn't all that wonderful. It tends to sometimes
generate really crappy code just because it's smaller, ie using a
multiply instruction in a critical code window just because doing a
few shifts and adds is larger. And that can be _so_ much slower that
it really hurts. So we might be better off with a model where we can
say "this code is important and really core kernel code that everybody
uses, do -O2 for this", and just compile _most_ of the kernel with
-Os)

                             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/