linux-kernel - Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110423080258.GA14952@elte.hu>
Date:	Sat, 23 Apr 2011 10:02:58 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Andi Kleen <ak@...ux.intel.com>
Cc:	arun@...rma-home.net, Stephane Eranian <eranian@...gle.com>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Lin Ming <ming.m.lin@...el.com>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>, eranian@...il.com,
	Arun Sharma <asharma@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [generalized cache events] Re: [PATCH 1/1] perf tools: Add
 missing user space support for config1/config2


* Andi Kleen <ak@...ux.intel.com> wrote:

> > > Yes, and note that with instructions events we even have skid-less PEBS 
> > > profiling so seeing the precise .
> >                                   - location of instructions is possible.
> 
> It was better when it was eaten. PEBS does not actually eliminated
> skid unfortunately. The interrupt still occurs later, so the
> instruction location is off.
> 
> PEBS merely gives you more information.

Have you actually tried perf's PEBS support feature? Try:

  perf record -e instructions:pp ./myapp

(the ':pp' postfix stands for 'precise' and activates PEBS+LBR tricks.)

Look at the perf report --tui annotated asssembly output (or check 'perf 
annotate' directly) and see how precise and skid-less the hits are. Works 
pretty well on Nehalem.

Here's a cache-bound loop with skid (profiled with '-e instructions'):

         :	0000000000400390 <main>:
    0.00 :	  400390:       31 c0                   xor    %eax,%eax
    0.00 :	  400392:       eb 22                   jmp    4003b6 <main+0x26>
   12.08 :	  400394:       fe 84 10 50 08 60 00    incb   0x600850(%rax,%rdx,1)
   87.92 :	  40039b:       48 81 c2 10 27 00 00    add    $0x2710,%rdx
    0.00 :	  4003a2:       48 81 fa 00 e1 f5 05    cmp    $0x5f5e100,%rdx
    0.00 :	  4003a9:       75 e9                   jne    400394 <main+0x4>
    0.00 :	  4003ab:       48 ff c0                inc    %rax
    0.00 :	  4003ae:       48 3d 10 27 00 00       cmp    $0x2710,%rax
    0.00 :	  4003b4:       74 04                   je     4003ba <main+0x2a>
    0.00 :	  4003b6:       31 d2                   xor    %edx,%edx
    0.00 :	  4003b8:       eb da                   jmp    400394 <main+0x4>
    0.00 :	  4003ba:       31 c0                   xor    %eax,%eax

Those 'ADD' instruction hits are bogus: 99% of the cost in this function is in 
the INCB, but the PMU NMI often skids to the next (few) instructions.

Profiled with "-e instructions:pp" we get:

         :	0000000000400390 <main>:
    0.00 :	  400390:       31 c0                   xor    %eax,%eax
    0.00 :	  400392:       eb 22                   jmp    4003b6 <main+0x26>
   85.33 :	  400394:       fe 84 10 50 08 60 00    incb   0x600850(%rax,%rdx,1)
    0.00 :	  40039b:       48 81 c2 10 27 00 00    add    $0x2710,%rdx
   14.67 :	  4003a2:       48 81 fa 00 e1 f5 05    cmp    $0x5f5e100,%rdx
    0.00 :	  4003a9:       75 e9                   jne    400394 <main+0x4>
    0.00 :	  4003ab:       48 ff c0                inc    %rax
    0.00 :	  4003ae:       48 3d 10 27 00 00       cmp    $0x2710,%rax
    0.00 :	  4003b4:       74 04                   je     4003ba <main+0x2a>
    0.00 :	  4003b6:       31 d2                   xor    %edx,%edx
    0.00 :	  4003b8:       eb da                   jmp    400394 <main+0x4>
    0.00 :	  4003ba:       31 c0                   xor    %eax,%eax

The INCB has the most hits as expected - but we also learn that there's 
something about the CMP.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/