[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110423080258.GA14952@elte.hu>
Date: Sat, 23 Apr 2011 10:02:58 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Andi Kleen <ak@...ux.intel.com>
Cc: arun@...rma-home.net, Stephane Eranian <eranian@...gle.com>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Lin Ming <ming.m.lin@...el.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, eranian@...il.com,
Arun Sharma <asharma@...com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add
missing user space support for config1/config2
* Andi Kleen <ak@...ux.intel.com> wrote:
> > > Yes, and note that with instructions events we even have skid-less PEBS
> > > profiling so seeing the precise .
> > - location of instructions is possible.
>
> It was better when it was eaten. PEBS does not actually eliminated
> skid unfortunately. The interrupt still occurs later, so the
> instruction location is off.
>
> PEBS merely gives you more information.
Have you actually tried perf's PEBS support feature? Try:
perf record -e instructions:pp ./myapp
(the ':pp' postfix stands for 'precise' and activates PEBS+LBR tricks.)
Look at the perf report --tui annotated asssembly output (or check 'perf
annotate' directly) and see how precise and skid-less the hits are. Works
pretty well on Nehalem.
Here's a cache-bound loop with skid (profiled with '-e instructions'):
: 0000000000400390 <main>:
0.00 : 400390: 31 c0 xor %eax,%eax
0.00 : 400392: eb 22 jmp 4003b6 <main+0x26>
12.08 : 400394: fe 84 10 50 08 60 00 incb 0x600850(%rax,%rdx,1)
87.92 : 40039b: 48 81 c2 10 27 00 00 add $0x2710,%rdx
0.00 : 4003a2: 48 81 fa 00 e1 f5 05 cmp $0x5f5e100,%rdx
0.00 : 4003a9: 75 e9 jne 400394 <main+0x4>
0.00 : 4003ab: 48 ff c0 inc %rax
0.00 : 4003ae: 48 3d 10 27 00 00 cmp $0x2710,%rax
0.00 : 4003b4: 74 04 je 4003ba <main+0x2a>
0.00 : 4003b6: 31 d2 xor %edx,%edx
0.00 : 4003b8: eb da jmp 400394 <main+0x4>
0.00 : 4003ba: 31 c0 xor %eax,%eax
Those 'ADD' instruction hits are bogus: 99% of the cost in this function is in
the INCB, but the PMU NMI often skids to the next (few) instructions.
Profiled with "-e instructions:pp" we get:
: 0000000000400390 <main>:
0.00 : 400390: 31 c0 xor %eax,%eax
0.00 : 400392: eb 22 jmp 4003b6 <main+0x26>
85.33 : 400394: fe 84 10 50 08 60 00 incb 0x600850(%rax,%rdx,1)
0.00 : 40039b: 48 81 c2 10 27 00 00 add $0x2710,%rdx
14.67 : 4003a2: 48 81 fa 00 e1 f5 05 cmp $0x5f5e100,%rdx
0.00 : 4003a9: 75 e9 jne 400394 <main+0x4>
0.00 : 4003ab: 48 ff c0 inc %rax
0.00 : 4003ae: 48 3d 10 27 00 00 cmp $0x2710,%rax
0.00 : 4003b4: 74 04 je 4003ba <main+0x2a>
0.00 : 4003b6: 31 d2 xor %edx,%edx
0.00 : 4003b8: eb da jmp 400394 <main+0x4>
0.00 : 4003ba: 31 c0 xor %eax,%eax
The INCB has the most hits as expected - but we also learn that there's
something about the CMP.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists