lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 Jan 2012 00:01:27 +0100
From:	Lénaïc Huard <lenaic@...ard.fr.eu.org>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Mackerras <paulus@...ba.org>, Ingo Molnar <mingo@...e.hu>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>
Cc:	linux-kernel@...r.kernel.org
Subject: Shift by one instruction in the perf annotate output

Hello,

Using perf and zooming up to the annotated assembler, I noticed a “shift by 
one” issue between the assembler instructions and the reported consumed cycles 
in the left margin.
Each time a given instruction is expensive, the high event count appears on 
the line right after.
In order to validate that feeling, I tried to put an expensive instruction 
(division) inside a block of cheaper instructions (no operation). And here is 
the output of perf annotate:

    0,00 :          40041d:       nop
    0,00 :          40041e:       nop
    5,31 :          40041f:       nop                                                                                                                                               
    0,00 :          400420:       divl   (%rsp)
   84,18 :          400423:       nop                                                                                                                                               
    0,00 :          400424:       nop
    0,07 :          400425:       nop

I would have expected the 84,18 count to be in front of the “divl” instruction 
and not in front of the “nop” one.

I am running Linux and perf 3.2 but I remember that previous versions suffered 
from the same issue.

I don’t know if it could be specific to my cpu:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
stepping        : 6
microcode       : 0xd0
cpu MHz         : 2394.000
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor 
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow
bogomips        : 4807.89
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

This issue is not specific to the perf tool itself since I observe exactly the 
same thing with OProfile.
Here is the output of opannotate for the same piece of code:

               :  40041d:       nop
               :  40041e:       nop
   971  0.5636 :  40041f:       nop
     1 5.8e-04 :  400420:       divl   (%rsp)
165045 95.7904 :  400423:       nop
     4  0.0023 :  400424:       nop
    18  0.0104 :  400425:       nop

Once again, the high number of samples should have been collected in “divl” 
and not in “nop”.


Here is the small program I used to observe that phenomenon (x86(-64) 
specific):

#include <stdlib.h>
#include <stdio.h>

void divide( int  dividend,
             int  divisor,
             int *quotient,
             int *remainder )
{
  for( unsigned i = 0 ; i < 1000000000 ; i++ )
  __asm__ __volatile__
    ( "xor %%edx,%%edx;"
      "nop;"
      "nop;"
      "nop;"
      "divl %[divisor];"
      "nop;"
      "nop;"
      "nop;"
      : "=a" (*quotient), "=&d" (*remainder)
      : "a" (dividend), [divisor] "m" (divisor)
      );
}

int main()
{
  int quotient, remainder;
  divide( 1234, 100, &quotient, &remainder );
  printf( "%d = %d x %d + %d\n", 1234, 100, quotient, remainder );
  return EXIT_SUCCESS;
}

$ gcc -pipe -o div div.c -std=c99 -O3
$ perf record -g ./div
$ perf report -g
$ perf annotate main # -O3 made divide() inlined
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ