lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 13 Feb 2018 16:47:12 +0000
From:   Jason Vas Dias <jason.vas.dias@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: perf Intel x86_64 : BUG: BRANCH_INSTRUCTIONS / BRANCH_MISSES cannot
 be combined with CACHE_REFERENCES / CACHE_MISSES .

Good day -

I'd much appreciate some advice as to why, on my Intel x86_64
( DisplayFamily_DisplayModel : 06_3CH ), running either Linux 4.12.10,
or Linux 3.10.0, any attempt to count all of :
     PERF_COUNT_HW_BRANCH_INSTRUCTIONS
          (or raw config 0xC4) , and
     PERF_COUNT_HW_BRANCH_MISSES
          (or raw config 0xC5), and
     combined with
     PERF_COUNT_HW_CACHE_REFERENCES
         (or raw config 0x4F2E ), and
     PERF_COUNT_HW_CACHE_MISSES
         (or raw config 0x412E) ,
results in ALL COUNTERS BEING 0 in a read of the Group FD or
mmap sample area.

This is demonstrated by the example program, which will
use perf_event_open() to create a Group Leader FD  for the first event,
and associate all other events with that Event Group , so that it
will read all events on the group FD .

The perf_event_open() calls and the ioctl(event_fd, PERF_EVENT_IOC_ID, &id)
calls all return successfully , but if I combine ANY of
( PERF_COUNT_HW_BRANCH_INSTRUCTIONS,
  PERF_COUNT_HW_BRANCH_MISSES
) with any of
( PERF_COUNT_HW_CACHE_REFERENCES,
  PERF_COUNT_HW_CACHE_MISSES
) in the Event Group, ALL events have '0' event->value.

Demo :
1. Compile program to use kernel mapped Generic Events:
  $ gcc -std=gnu11 -o perf_bug perf_bug.c
  Running program shows all counters have 0 values, since both
  CACHE & BRANCH hits+misses are being requested:

  $ ./perf_bug
  EVENT: Branch Instructions : 0
  EVENT: Branch Misses : 0
  EVENT: Instructions : 0
  EVENT: CPU Cycles : 0
  EVENT: Ref. CPU Cycles : 0
  EVENT: Bus Cycles : 0
  EVENT: Cache References : 0
  EVENT: Cache Misses : 0

  NOT registering interest in EITHER the BRANCH counters
  OR the CACHE counters fixes the problem:

  Compile without registering for BRANCH_INSTRUCTIONS
  or BRANCH_MISSES:
  $ gcc -std=gnu11 -DNO_BUG_NO_BRANCH  -o perf_bug perf_bug.c
  $ ./perf_bug
  EVENT: Instructions : 914
  EVENT: CPU Cycles : 4110
  EVENT: Ref. CPU Cycles : 4437
  EVENT: Bus Cycles : 152
  EVENT: Cache References : 1
  EVENT: Cache Misses : 1

  Compile without registering for CACHE_REFERENCES or CACHE_MISSES:
  $ gcc -std=gnu11 -DNO_BUG_NO_CACHE  -o perf_bug perf_bug.c
  $ ./perf_bug
EVENT: Branch Instructions : 106
EVENT: Branch Misses : 6
EVENT: Instructions : 914
EVENT: CPU Cycles : 4132
EVENT: Ref. CPU Cycles : 8526
EVENT: Bus Cycles : 295

The same thing happens if I do not use Generic Events, but rather
"dynamic raw PMU" events, by putting the hex values from
/sys/bus/event_source/devices/cpu/events/? into the perf_event_attr
config, OR'ed with (1<<63), and using the PERF_TYPE_RAW perf_event_attr
type value :

$ gcc -DUSE_RAW_PMU -o perf_bug perf_bug.c
$ ./perf_bug
EVENT: Branch Instructions : 0
EVENT: Branch Misses : 0
EVENT: Instructions : 0
EVENT: CPU Cycles : 0
EVENT: Ref. CPU Cycles : 0
EVENT: Bus Cycles : 0
EVENT: Cache References : 0
EVENT: Cache Misses : 0


$ gcc -DUSE_RAW_PMU -DNO_BUG_NO_BRANCH -o perf_bug perf_bug.c
$ ./perf_bug
EVENT: Instructions : 914
EVENT: CPU Cycles : 4102
EVENT: Ref. CPU Cycles : 4959
EVENT: Bus Cycles : 171
EVENT: Cache References : 2
EVENT: Cache Misses : 2

$ gcc -DUSE_RAW_PMU -DNO_BUG_NO_CACHE -o perf_bug perf_bug.c
$ ./perf_bug
EVENT: Branch Instructions : 106
EVENT: Branch Misses : 6
EVENT: Instructions : 914
EVENT: CPU Cycles : 4108
EVENT: Ref. CPU Cycles : 10817
EVENT: Bus Cycles : 373


The perf tool itself seems to have the same issue:

With CACHE & BRANCH counters does not work :
$ perf stat -e '{r0c4,r0c5,r0c0,r03c,r0300,r013c,r04F2E,r0412E}:SIu' sleep 1

 Performance counter stats for 'sleep 1':

     <not counted>      r0c4
               (0.00%)
     <not counted>      r0c5
               (0.00%)
     <not counted>      r0c0
               (0.00%)
     <not counted>      r03c
               (0.00%)
     <not counted>      r0300
               (0.00%)
     <not counted>      r013c
               (0.00%)
     <not counted>      r04F2E
               (0.00%)
   <not supported>     r0412E

       1.001652932 seconds time elapsed

   Some events weren't counted. Try disabling the NMI watchdog:
	echo 0 > /proc/sys/kernel/nmi_watchdog
	perf stat ...
	echo 1 > /proc/sys/kernel/nmi_watchdog

Disabling the NMI watchdog makes no difference .

It is very strange that perf thinks 'r0412E' is not supported :
   $ cat /sys/bus/event_source/devices/cpu/cache_misses
   event=0x2e,umask=0x41

The kernel should not be advertizing an unsupported event
in a  /sys/bus/event_source/devices/cpu/events/ file, should it ?

So perf stat has the same problem - without either Cache or Branch
counters seems to work fine:

without cache:
$ perf stat -e '{r0c4,r0c5,r0c0,r03c,r0300,r013c}:SIu' sleep 1

 Performance counter stats for 'sleep 1':

             37740      r0c4
              3557      r0c5
            188552      r0c0
            311684      r03c
            360963      r0300
             12461      r013c

       1.001508109 seconds time elapsed

without branch:
$ perf stat -e '{r0c0,r03c,r0300,r013c,r04F2E,r0412E}:SIu' sleep 1

 Performance counter stats for 'sleep 1':

            188554      r0c0
            320242      r03c
            452748      r0300
             15633      r013c
              4145      r04F2E
              3022      r0412E

       1.001810421 seconds time elapsed

proving again that perf's claim that 'r0412E' is not supported is bogus.
The Intel SDM's table 19-1 Architectural events, which ALL Intel CPUs
are meant to support, does include  'Event: 2EH | Umask: 4FH : LLC
Reference '  and  'Event: 2EH | Umask: 41H : LLC Miss' , as well as :
'Event : C4H | Umask: 00H : Branch Instructions Retired' and
'Event : C5H | Umask: 00H : Branch Misses Retired' .
So why can't perf count them all in the same group?

Please , can anyone enlighten me as to what is going on here ?

Why can't I count all of
   ( BRANCH_INSTRUCTIONS , BRANCH_MISSES ,
     CACHE_REFERENCES, CACHE_MISSES
  )
in the same Perf Event Group ?

Thanks in advance for any replies,
Best Regards,
Jason

View attachment "perf_bug.c" of type "text/x-csrc" (5785 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ