[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090224175055.GA14534@elte.hu>
Date: Tue, 24 Feb 2009 18:50:55 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Dave Hansen <dave@...ux.vnet.ibm.com>,
Nick Piggin <nickpiggin@...oo.com.au>,
Salman Qazi <sqazi@...gle.com>, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>, Andi Kleen <andi@...stfloor.org>,
Dave Hansen <haveblue@...ibm.com>
Subject: Re: Another Performance Regression in write() syscall
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Tue, 24 Feb 2009, Dave Hansen wrote:
> >
> > Yeah, that's a good point. Are we sure that's what is
> > happening here, though? That's one thing a profile would
> > hopefully help with.
>
> One thing to note is that _if_ it's purely an issue of
> nontemporal stores vs normal stores, then profiling is very
> likely going to be almost entirely useless. You'll get
> "results", but the results have nothing what-so-ever to do
> with reality or anything interesting.
>
> The nontemporal stores may stand out in the profiles, but the
> actual performance impact will be all about whether totally
> unrelated code got cache misses or not. Quite often those
> cache misses will also be in user mode, and very possibly in
> other processes.
>
> So profiles can certainly be interesting, but if Salman says
> that his patch makes a difference for his benchmark, then
> profiling is almost certainly not interesting FOR THAT PATCH.
> It's interesting mainly as a way to look at whether there are
> then also _other_ issues that are worth addressing (ie the
> whole atime thing is in a whole different dimension and an
> independent issue).
a 'perfstat' run would certainly be interesting (for cases where
a pure /usr/bin/time run is inconclusive), comparing the
unpatched and patched kernel.
That way we can see summary counts for the whole workload, like:
-----------------------------------------------
| Performance counter stats for './mmap-perf' |
-----------------------------------------------
| |
| x86-defconfig | PARAVIRT=y
|------------------------------------------------------------------
|
| 1311.554526 | 1360.624932 task clock ticks (msecs) +3.74%
| |
| 1 | 1 CPU migrations
| 91 | 79 context switches
| 55945 | 55943 pagefaults
| ............................................
| 3781392474 | 3918777174 CPU cycles +3.63%
| 1957153827 | 2161280486 instructions +10.43%
| 50234816 | 51303520 cache references +2.12%
| 5428258 | 5583728 cache misses +2.86%
|
| 437983499 | 478967061 branches +9.36%
| 32486067 | 32336874 branch-misses -0.46%
| |
| 1314.782469 | 1363.694447 time elapsed (msecs) +3.72%
| |
-----------------------------------
Such a comparison of would certainly be more meaningful for such
things than a profile.
Salman, if you are interested in doing a perfstat comparison,
just pick up a tip:master kernel [perfcounters are
default-enabled in it]:
http://people.redhat.com/mingo/tip.git/README
and run perfstat on it (as root, to get the kernel-mode counts
too):
http://redhat.com/~mingo/perfcounters/perfstat.c
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists