[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110607090743.GC4133@elte.hu>
Date: Tue, 7 Jun 2011 11:07:43 +0200
From: Ingo Molnar <mingo@...e.hu>
To: david@...g.hm
Cc: pageexec@...email.hu, Andrew Lutomirski <luto@....edu>,
x86@...nel.org, Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, Jesper Juhl <jj@...osbits.net>,
Borislav Petkov <bp@...en8.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>,
Jan Beulich <JBeulich@...ell.com>,
richard -rw- weinberger <richard.weinberger@...il.com>,
Mikael Pettersson <mikpe@...uu.se>,
Andi Kleen <andi@...stfloor.org>,
Brian Gerst <brgerst@...il.com>,
Louis Rilling <Louis.Rilling@...labs.com>,
Valdis.Kletnieks@...edu
Subject: Re: [PATCH v5 8/9] x86-64: Emulate legacy vsyscalls
* david@...g.hm <david@...g.hm> wrote:
> > why are you cutting out in all those mails of yours what i
> > already told you many times? the original statement from Andy was
> > about the int cc path vs. the pf path: he said that the latter
> > would not tolerate a few well predicted branches (if they were
> > put there at all, that is) because the pf handler is such a
> > critical fast path code. it is *not*. it can't be by almost
> > definition given how much processing it has to do (it is by far
> > one of the most complex of cpu exceptions to process).
>
> it seems to me that such a complicated piece of code that is
> executed so frequently is especially sensitive to anything that
> makes it take longer
Exactly.
Firstly, fully handling the most important types of minor page faults
takes about 2000 cycles on modern x86 hardware - just two cycles
overhead is 0.1% overhead and in the kernel we are frequently doing
0.01% optimizations as well ...
Secondly, we optimize the branch count, even if they are
well-predicted: the reason is to reduce the BTB footprint which is a
limited CPU resource like the TLB. Every BTB entry we use up reduces
the effective BTB size visible to user-space applications.
Thirdly, we always try to optimize L1 instruction cache footprint in
fastpaths as well and new instructions increase the icache footprint.
Fourthly, the "single branch overhead" is the *best case* that is
rarely achieved in practice: often there are other instructions such
as the compare instruction that precedes the branch ...
These are the reasons why we did various micro-optimizations in the
past like:
b80ef10e84d8: x86: Move do_page_fault()'s error path under unlikely()
92181f190b64: x86: optimise x86's do_page_fault (C entry point for the page fault path)
74a0b5762713: x86: optimize page faults like all other achitectures and kill notifier cruft
So if he argues that a single condition does not matter to our page
fault fastpath then that is just crazy talk and i'd not let him close
to the page fault code with a ten foot pole.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists