[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxhUDTW_Pa9-+jmXhNDDTy5nrkiSaswxRTHh7u+j8gnOA@mail.gmail.com>
Date: Mon, 5 Feb 2018 09:19:05 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Laight <David.Laight@...lab.com>
Cc: Linus Walleij <linus.walleij@...aro.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Ingo Molnar <mingo@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
"linux-gpio@...r.kernel.org" <linux-gpio@...r.kernel.org>
Subject: Re: [GIT PULL] pin control bulk changes for v4.16
On Mon, Feb 5, 2018 at 8:55 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> End result: opening a file - whether it exists or not - doesn't
> actually go down to the filesystem at all when things are cached. My
> kernel profiles also show that very clearly, there's absolutely no
> filesystem component to the build at all (but there is a noticeable
> VFS component to it, and __d_lookup_rcu is generally one of the
> hottest kernel functions along with the system call entry/exit code).
Note that when I do kernel profiles of kernel builds, I do it mostly
for the "everything is already built" case, so the real footprint for
much of my profiles is actually mostly "make" doing millions of
open/stat calls.
Because once you actually build things, the kernel is almost not
noticeable any more. It's all gcc. And people always say that it's
optimizations that are expensive, but from the profiling I've done of
user space, a _lot_ of time is spent in just parsing and reading the
data.
In fact, having just re-done this, the top function in profiling is
"_cpp_lex_token()" at 3.4% of overall time for my test kernel build.
That matches my experience from sparse: the real overhead in a
compiler is just the stupid lexing/parsing. Cache misses galore, and
there's nothing really smart you can do about it.
Once you get to optimization, you can do smart things like hash the
SSA representation to do CSE cheaply etc clever data structures. But
lexing and parsing the tree is reading text and allocating and
generating the internal representation, and it's just "work". Lots of
it.
And that is why trying to avoid unnecessary header includes matters so
much. Because the front-end really does matter for compiler
performance.
(And it's at least partly why C++ is such a pain to compile, and why
C++ people want pre-compiled headers etc. You can't just do a forward
declaration of a struct type, and you get header inclusion from hell
when you have "clever" classes and inheritance etc. C++ build times
tend to be really nasty as a result).
Linus
Powered by blists - more mailing lists