[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090122230423.GA19569@elte.hu>
Date: Fri, 23 Jan 2009 00:04:23 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Jeremy Fitzhardinge <jeremy@...p.org>
Cc: Nick Piggin <npiggin@...e.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, hpa@...or.com,
jeremy@...source.com, chrisw@...s-sol.org, zach@...are.com,
rusty@...tcorp.com.au
Subject: Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
* Jeremy Fitzhardinge <jeremy@...p.org> wrote:
> Ingo Molnar wrote:
>> Ouch, that looks unacceptably expensive. All the major distros turn
>> CONFIG_PARAVIRT on. paravirt_ops was introduced in x86 with the express
>> promise to have no measurable runtime overhead.
>>
>> ( And i suspect the real life mmap cost is probably even more expensive,
>> as on a Barcelona all of lmbench fits into the cache hence we dont see
>> any real $cache overhead. )
>>
>> Jeremy, any ideas where this slowdown comes from and how it could be
>> fixed?
>>
>
> I just posted a couple of patches to pick some low-hanging fruit. It
> turns out that we don't need to do any pvops calls to do pte flag
> manipulations. I'd be interested to see how much of a difference it
> makes (it reduces the static code size by a few k).
I've tried your patches - but can see no significant reduction in
overhead. I've updated my table with numbers from your patches:
-----------------------------------------------
| Performance counter stats for './mmap-perf' |
-----------------------------------------------
| | |
| defconfig | PARAVIRT=y | +Jeremy
|-----------------------------------------------------------------------
|
| 1311.55452 | 1360.62493 | 1378.94464 task clock (msecs) +3.74%
| | |
| 1 | 1 | 0 CPU migrations
| 91 | 79 | 77 context switches
| 55945 | 55943 | 55980 pagefaults
|.......................................................................
| 3781392474 | 3918777174 | 3907189795 CPU cycles +3.63%
| 1957153827 | 2161280486 | 2161741689 instructions +10.43%
| 50234816 | 51303520 | 50619593 cache references +2.12%
| 5428258 | 5583728 | 5575808 cache misses +2.86%
|
| 437983499 | 478967061 | 479053595 branches +9.36%
| 32486067 | 32336874 | 32377710 branch-misses -0.46%
| |
| 1314.78246 | 1363.69444 | 1357.58161 time elapsed (msecs) +3.72%
| |
------------------------------------------------------------------------
'+Jeremy' is a CONFIG_PARAVIRT=y run done with your patches.
The most stable count is the instruction count:
| 1957153827 | 2161280486 | 2161741689 instructions +10.43%
But your two patches did not reduce the instruction count in any
measurable way.
In any case, it is rather inefficient of me proxy-testing your patches,
you can do these measurements yourself too on any Core2 or later Intel
CPU, by running tip/master plus picking up these two utilities:
http://people.redhat.com/mingo/perfcounters/perfstat.c
http://redhat.com/~mingo/misc/mmap-perf.c
building them and running this (as root):
taskset 1 ./perfstat ./mmap-perf 1
it will give you numbers like the ones above.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists