[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100610162453.GF19561@basil.fritz.box>
Date: Thu, 10 Jun 2010 18:24:53 +0200
From: Andi Kleen <andi@...stfloor.org>
To: Ingo Molnar <mingo@...e.hu>
Cc: Andi Kleen <andi@...stfloor.org>,
Peter Zijlstra <peterz@...radead.org>,
Jason Baron <jbaron@...hat.com>, linux-kernel@...r.kernel.org,
mathieu.desnoyers@...ymtl.ca, hpa@...or.com, tglx@...utronix.de,
rostedt@...dmis.org, roland@...hat.com, rth@...hat.com,
mhiramat@...hat.com, fweisbec@...il.com, avi@...hat.com,
davem@...emloft.net, vgoyal@...hat.com, sam@...nborg.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>
Subject: Re: [PATCH 03/13] jump label v9: x86 support
On Thu, Jun 10, 2010 at 05:37:42PM +0200, Ingo Molnar wrote:
>
> > [...] It costs you in some benchmarks. [...]
>
> Microbenchmarks mostly, see below.
I didn't make these decisions, but I assume who made them had good reasons
and enough data on larger benchmarks too.
> > A much better to get smaller kernel images is to do more __cold annotations
> > for slow paths. Newer gcc will then simply only do -Os for these functions.
>
> That's an opt-in method and we cannot reach the kinds of 30% code size
> reductions that -Os can achieve. Most code in the kernel is not cache-hot,
> even on microbenchmarks.
Maybe, maybe not. But yes it can be approached from both ways.
Personally I would prefer to simply write less bloated code to get
code reductions. Simpler code is often faster too.
>
> A much better model would be to actively mark hot codepaths with a __hot
> attribute instead. Then the code size difference can be considered on a case
> by case basis.
Yes that works too for those who still use -Os.
e.g. marking the scheduler and a few mm hot paths this way would certain make sense.
>
> And where GCC produces indefensibly crap code there GCC needs to be fixed.
> Crap code often increases size so the fix would increase the efficiency of
> -Os.
In some cases agreed, but common cases it's really: you asked for the smallest
you got it, even if it's slow. It's not -Odwim.
One standard example here is a division by constant. The shortest way is
using DIVI/IDIV if it's not 2^n and small enough, but it's really quite slow
in hardware. If you spend a few more bytes you can do much better for a wide
range of constants.
Most likely we would need a new -O flag to avoid such cases.
BTW I experimented with marking a few common cases like this (e.g. time unit
conversion) hot, but gcc currently has trouble with __hot on inlines. So you
would always need to mark the caller.
-Andi
--
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists