linux-kernel - performance counter 20% error finding retired instruction count

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0906240937120.10620@pianoman.cluster.toy>
Date:	Wed, 24 Jun 2009 09:59:54 -0400 (EDT)
From:	Vince Weaver <vince@...ter.net>
To:	linux-kernel@...r.kernel.org
Subject: performance counter 20% error finding retired instruction count

Hello

As an aside, is it time to set up a dedicated Performance Counters
for Linux mailing list?   (Hereafter referred to as p10c7l to avoid
confusion with the other implementations that have already taken
all the good abbreviated forms of the concept).  If/when the 
infrastructure appears in a released kernel, there's going to be a lot of 
chatter by people who use performance counters and suddenly find they are 
stuck with a huge step backwards in functionality.  And asking Fortran 
programmers to provide kernel patches probably won't be a productive 
response.  But I digress.

I was trying to get an exact retired instruction count from p10c7l.
I am using the test million.s, available here
  ( http://www.csl.cornell.edu/~vince/projects/perf_counter/million.s )
It should count exactly one million instructions.

Tests with valgrind and qemu show that it does.

Using perfmon2 on Pentium Pro, PII, PIII, P4, Athlon32, and Phenom
all give the proper result:

tobler:~% pfmon -e retired_instructions ./million
1000002 RETIRED_INSTRUCTIONS

    ( it is 1,000,002 +/-2 because on most x86 architectures retired
      instruction count includes any hardware interrupts that might
      happen at the time.  It woud be a great feature if p10c7l
      could add some way of gathering the per-process hardware
      instruction count statistic to help quantify that).

Yet with perf on the same Athlon32 machine (using
kernel 2.6.30-03984-g45e3e19) gives:

tobler:~%perf stat ./million

  Performance counter stats for './million':

        1.519366  task-clock-ticks     #       0.835 CPU utilization factor
               3  context-switches     #       0.002 M/sec
               0  CPU-migrations       #       0.000 M/sec
              53  page-faults          #       0.035 M/sec
         2483822  cycles               #    1634.775 M/sec
         1240849  instructions         #     816.689 M/sec # 0.500 per cycle
          612685  cache-references     #     403.250 M/sec
            3564  cache-misses         #       2.346 M/sec

  Wall-clock time elapsed:     1.819226 msecs

Running multiple times gives:
    1240849
    1257312
    1242313

Which is a varying error of at least 20% which isn't even 
consistent.  Is this because of sampling?  The documentation doesn't 
really warn about this as far as I can tell.

Thanks for any help resolving this problem

Vince

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/