[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0901111458290.6528@localhost.localdomain>
Date: Sun, 11 Jan 2009 15:05:53 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Andi Kleen <andi@...stfloor.org>
cc: David Woodhouse <dwmw2@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
Harvey Harrison <harvey.harrison@...il.com>,
"H. Peter Anvin" <hpa@...or.com>,
Chris Mason <chris.mason@...cle.com>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
paulmck@...ux.vnet.ibm.com, Gregory Haskins <ghaskins@...ell.com>,
Matthew Wilcox <matthew@....cx>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Nick Piggin <npiggin@...e.de>,
Peter Morreale <pmorreale@...ell.com>,
Sven Dietrich <SDietrich@...ell.com>, jh@...e.cz
Subject: Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement
adaptive spinning
On Sun, 11 Jan 2009, Linus Torvalds wrote:
> On Sun, 11 Jan 2009, Andi Kleen wrote:
> >
> > Was -- i think that got fixed in gcc. But again only in newer versions.
>
> I doubt it. People have said that about a million times, it has never
> gotten fixed, and I've never seen any actual proof.
In fact, I just double-checked.
Try this:
struct a {
unsigned long array[200];
int a;
};
struct b {
int b;
unsigned long array[200];
};
extern int fn3(int, void *);
extern int fn4(int, void *);
static inline __attribute__((always_inline)) int fn1(int flag)
{
struct a a;
return fn3(flag, &a);
}
static inline __attribute__((always_inline)) int fn2(int flag)
{
struct b b;
return fn4(flag, &b);
}
int fn(int flag)
{
if (flag & 1)
return fn1(flag);
return fn2(flag);
}
(yeah, I made sure it would inline with "always_inline" just so that the
issue wouldn't be hidden by any "avoid stack frames" flags).
Gcc creates a big stack frame that contains _both_ 'a' and 'b', and does
not merge the allocations together even though they clearly have no
overlap in usage. Both 'a' and 'b' get 201 long-words (1608 bytes) of
stack, causing the inlined version to have 3kB+ of stack, even though the
non-inlined one would never use more than half of it.
So please stop claiming this is fixed. It's not fixed, never has been, and
quite frankly, probably never will be because the lifetime analysis is
hard enough (ie once you inline and there is any complex usage, CSE etc
will quite possibly mix up the lifetimes - the above is clearly not any
_realistic_ example).
So even if the above trivial case could be fixed, I suspect a more complex
real-life case would still keep the allocations separate. Because merging
the allocations and re-using the same stack for both really is pretty
non-trivial, and the best solution is to simply not inline.
(And yeah, the above is such an extreme case that gcc seems to realize
that it makes no sense to inline because the stack frame is _so_ big. I
don't know what the default stack frame limit is, but it's apparently
smaller than 1.5kB ;)
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists