linux-kernel - significant variation in performance counters on POWER6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Date:	Tue, 04 May 2010 08:54:01 +0200
From:	Victor Javier <victorjj@...upc.edu>
To:	linux-kernel@...r.kernel.org
Subject: significant variation in performance counters on POWER6

Hello,

I am doing some research where I need to collect performance information
for SPEC CPU2006 benchmarks on a POWER6 JS22 system. Previously I was
using perfmon2, but after the release of "performance counters for
linux" (and the 'perf' tool), I decided to try it. One of the reasons
was the native support for multiplexing.

However, I have been noticing a much higher variability when using perf,
compared to perfmon2. As an example, I will provide data for 'bwaves'
benchmark when run with the reference input set (it takes around 20
minutes to finish).

The information for the kernels I am using is:
* perfmon2: Linux version 2.6.28-pfmon2 (gcc version 4.1.2 20070115 (SUSE Linux)) #6 SMP
* perf: Linux version 2.6.33.3-perf (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP

I am using libpfm version 3.8.

I can provide more information, such as modules, detailed processor
information, etc.) if necessary.

The commands I used to collect the counters are:

perfmon2: pfmon -e PM_CYC,PM_INST_CMPL,PM_LD_MISS_L1 ./bwaves_base.Linux64
perf: perf stat -e r1e:u,r2:u,r80080:u ./bwaves_base.Linux64

I also tried to pin the execution to a given CPU, but the results were
the same.
I repeated the executions 10 times, so I am also providing the mean and
the standard deviation.

============
= perfmon2 =
============

        cycles               instrs completed     L1 load misses
        4,567,041,667,206    2,772,827,993,242    6,918,871,375
        4,569,071,274,248    2,772,827,992,642    6,931,066,292
        4,568,234,790,260    2,772,827,992,716    6,922,975,235
        4,566,485,780,016    2,772,827,992,065    6,917,600,192
        4,566,437,677,239    2,772,827,992,067    6,915,222,376
        4,566,640,807,800    2,772,827,992,066    6,915,703,838
        4,566,466,402,423    2,772,827,992,062    6,914,107,325
        4,569,322,329,138    2,772,828,006,865    6,933,546,730
        4,567,018,722,323    2,772,827,992,066    6,914,210,622
        4,566,778,622,700    2,772,827,992,066    6,914,251,098

mean  4,567,349,807,335    2,772,827,993,786    6,919,755,508
stdev     1,107,043,810                4,614        7,178,958

========
= perf =
========

        cycles               instrs completed     L1 load misses
        4,562,017,366,591    2,772,768,370,128    7,134,353,697
        4,541,500,651,248    2,772,868,724,285    6,341,491,710
        4,550,876,532,582    2,772,787,520,375    6,661,719,666
        4,540,558,691,334    2,772,868,724,156    6,266,617,715
        4,573,942,460,136    2,772,861,831,519    7,419,020,488
        4,587,876,861,751    2,772,868,724,189    8,174,507,077
        4,550,771,568,044    2,772,841,147,861    6,547,437,055
        4,600,947,093,875    2,772,787,520,375    9,152,895,835
        4,572,501,705,517    2,772,861,831,526    7,765,464,256
        4,561,690,369,227    2,772,787,520,368    6,902,452,934

mean  4,564,268,330,031    2,772,830,191,478    7,236,596,043
stdev    19,770,352,264           41,980,009      914,965,698

As can be seen, the standard deviation for perf is significantly higher.
Considering the instructions completed, perf shows a 10000x higher
standard deviation. Although this variation may not be very high if
compared to the absolute number of instructions completed, it is an
issue for the case of L1 load misses. In the case of perfmon2 I can
expect misses to be in the range [6,905,397,592 .. 6,934,113,424], which
is a tight confidence interval. However, for perf this interval grows
until [5,406,664,646 .. 9,066,527,440]. This variation is clearly not
acceptable, as I cannot really draw any conclusion from those results.

I would like to know if you are aware of this issue, and which could be
the causes. I would also appreciate any help into fixing this.

In case it is not easy to read the data, I provide it as a separate PDF
file as well. I also attach a couple of graphs showing the variation for
instructions and misses.

Thank you for any help on this,
Victor

Download attachment "graphs.pdf" of type "video/x-ms-wm" (20524 bytes)

Download attachment "data.pdf" of type "video/x-ms-wm" (38315 bytes)