[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120821170216.GM16230@one.firstfloor.org>
Date: Tue, 21 Aug 2012 19:02:16 +0200
From: Andi Kleen <andi@...stfloor.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org,
x86@...nel.org, mmarek@...e.cz, linux-kbuild@...r.kernel.org,
JBeulich@...e.com, akpm@...ux-foundation.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>, hubicka@....cz
Subject: Re: RFC: Link Time Optimization support for the kernel
> The other hope would be that if LTO is used by a high-profile
> project like the Linux kernel then the compiler folks might look
> at it and improve it.
Yes definitely. I already got lot of help from toolchain people.
>
> > A lot of the overhead on the larger builds is also some
> > specific gcc code that I'm working with the gcc developers on
> > to improve. So the 4x extreme case will hopefully go down.
> >
> > The large builds also currently suffer from too much memory
> > consumption. That will hopefully improve too, as gcc improves.
>
> Are there any LTO build files left around, blowing up the size
> of the build tree?
The objdir size increases from the immediate information in the objects,
even though it's compressed. A typical LTO objdir is about 2.5x
as big as non LTO.
[this will go down a bit with slim LTO; right now there is an unnecessary
copy of the non LTOed code too; but I expect it will still be
significantly larger]
There's also the TMPDIR problem. If you put /tmp in tmpfs and gcc
defaults to put the immediate files during the final link into
/tmp the memory fills up even faster, because tmpfs is competing
with anonymous memory.
4.7 improved a lot over 4.6 for this with better partitioning; with 4.6 I
had some spectacular OOMst. 4.6 is not supported for LTO anymore now,
with 4.7 it became much better.
I also hope tmpfs will get better algorithms eventually that make
this less likely.
Anyways this can be overriden by setting TMPDIR to the object directory.
With TMPDIR set and not too aggressive -j* for most kernels you should
be ok with 4GB of memory. Just allyes still suffers.
This was one of the reasons why I made it not default for allyesconfig.
> > so we'll hopefully see more gains over time. Essentially it
> > gives more power to the compiler.
> >
> > Long term it would also help the kernel source organization.
> > For example there's no reason with LTO to have gigantic
> > includes with large inlines, because cross file inlining works
> > in a efficient way without reparsing.
>
> Can the current implementation of LTO optimize to the level of
> inlining? A lot of our include file hell situation results from
Yes, it does cross file inlining. Maybe a bit too much even
(Currently there are about 40% less static CALLs when LTOed)
In fact some of the current workarounds limit it, so there may be
even more in the future.
One side effect is that backtraces are harder to read. You'll
need to rely more on addr2line than before (or we may need
to make kallsyms smarter)
It only inlines inside a final binary though, as Avi mentioned,
so it's more useful inside a subsystem for modular kernels.
> If data structures could be encapsulated/internalized to
> subsystems and only global functions are exposed to other
> subsystems [which are then LTO optimized] then our include
> file dependencies could become a *lot* simpler.
Yes, long term we could have these benefits.
BTW I should add LTO does more than just inlining:
- Drop unused global functions and variables
(so may cut down on ifdefs)
- Detect type inconsistencies between files
- Partial inlining (inline only parts of a function like a test
at the beginning)
- Detect pure and const functions without side effects that can be more
aggressively optimized in the caller.
- Detect global clobbers globally. Normally any global call has to
assume all global variables could be changed. With LTO information some
of them can be cached in registers over calls.
- Detect read only variables and optimize them
- Optimize arguments to global functions (drop unnecessary arguments,
optimize input/output etc.)
- Replace indirect calls with direct calls, enabling other
optimizations.
- Do constant propagation and specialization for functions. So if a
function is called commonly with a constant it can generate a special
variant of this function optimized for that. This still needs more tuning (and
currently the code size impact is on the largish side), but I hope
to eventually have e.g. a special kmalloc optimized for GFP_KERNEL.
It can also in principle inline callbacks.
-Andi
--
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists