[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALyZvKyQFp0CfwFfBosM=OK5DJgb71bhMAkhLU2Wdv0asTSVZw@mail.gmail.com>
Date: Tue, 13 Feb 2018 16:47:12 +0000
From: Jason Vas Dias <jason.vas.dias@...il.com>
To: linux-kernel@...r.kernel.org
Subject: perf Intel x86_64 : BUG: BRANCH_INSTRUCTIONS / BRANCH_MISSES cannot
be combined with CACHE_REFERENCES / CACHE_MISSES .
Good day -
I'd much appreciate some advice as to why, on my Intel x86_64
( DisplayFamily_DisplayModel : 06_3CH ), running either Linux 4.12.10,
or Linux 3.10.0, any attempt to count all of :
PERF_COUNT_HW_BRANCH_INSTRUCTIONS
(or raw config 0xC4) , and
PERF_COUNT_HW_BRANCH_MISSES
(or raw config 0xC5), and
combined with
PERF_COUNT_HW_CACHE_REFERENCES
(or raw config 0x4F2E ), and
PERF_COUNT_HW_CACHE_MISSES
(or raw config 0x412E) ,
results in ALL COUNTERS BEING 0 in a read of the Group FD or
mmap sample area.
This is demonstrated by the example program, which will
use perf_event_open() to create a Group Leader FD for the first event,
and associate all other events with that Event Group , so that it
will read all events on the group FD .
The perf_event_open() calls and the ioctl(event_fd, PERF_EVENT_IOC_ID, &id)
calls all return successfully , but if I combine ANY of
( PERF_COUNT_HW_BRANCH_INSTRUCTIONS,
PERF_COUNT_HW_BRANCH_MISSES
) with any of
( PERF_COUNT_HW_CACHE_REFERENCES,
PERF_COUNT_HW_CACHE_MISSES
) in the Event Group, ALL events have '0' event->value.
Demo :
1. Compile program to use kernel mapped Generic Events:
$ gcc -std=gnu11 -o perf_bug perf_bug.c
Running program shows all counters have 0 values, since both
CACHE & BRANCH hits+misses are being requested:
$ ./perf_bug
EVENT: Branch Instructions : 0
EVENT: Branch Misses : 0
EVENT: Instructions : 0
EVENT: CPU Cycles : 0
EVENT: Ref. CPU Cycles : 0
EVENT: Bus Cycles : 0
EVENT: Cache References : 0
EVENT: Cache Misses : 0
NOT registering interest in EITHER the BRANCH counters
OR the CACHE counters fixes the problem:
Compile without registering for BRANCH_INSTRUCTIONS
or BRANCH_MISSES:
$ gcc -std=gnu11 -DNO_BUG_NO_BRANCH -o perf_bug perf_bug.c
$ ./perf_bug
EVENT: Instructions : 914
EVENT: CPU Cycles : 4110
EVENT: Ref. CPU Cycles : 4437
EVENT: Bus Cycles : 152
EVENT: Cache References : 1
EVENT: Cache Misses : 1
Compile without registering for CACHE_REFERENCES or CACHE_MISSES:
$ gcc -std=gnu11 -DNO_BUG_NO_CACHE -o perf_bug perf_bug.c
$ ./perf_bug
EVENT: Branch Instructions : 106
EVENT: Branch Misses : 6
EVENT: Instructions : 914
EVENT: CPU Cycles : 4132
EVENT: Ref. CPU Cycles : 8526
EVENT: Bus Cycles : 295
The same thing happens if I do not use Generic Events, but rather
"dynamic raw PMU" events, by putting the hex values from
/sys/bus/event_source/devices/cpu/events/? into the perf_event_attr
config, OR'ed with (1<<63), and using the PERF_TYPE_RAW perf_event_attr
type value :
$ gcc -DUSE_RAW_PMU -o perf_bug perf_bug.c
$ ./perf_bug
EVENT: Branch Instructions : 0
EVENT: Branch Misses : 0
EVENT: Instructions : 0
EVENT: CPU Cycles : 0
EVENT: Ref. CPU Cycles : 0
EVENT: Bus Cycles : 0
EVENT: Cache References : 0
EVENT: Cache Misses : 0
$ gcc -DUSE_RAW_PMU -DNO_BUG_NO_BRANCH -o perf_bug perf_bug.c
$ ./perf_bug
EVENT: Instructions : 914
EVENT: CPU Cycles : 4102
EVENT: Ref. CPU Cycles : 4959
EVENT: Bus Cycles : 171
EVENT: Cache References : 2
EVENT: Cache Misses : 2
$ gcc -DUSE_RAW_PMU -DNO_BUG_NO_CACHE -o perf_bug perf_bug.c
$ ./perf_bug
EVENT: Branch Instructions : 106
EVENT: Branch Misses : 6
EVENT: Instructions : 914
EVENT: CPU Cycles : 4108
EVENT: Ref. CPU Cycles : 10817
EVENT: Bus Cycles : 373
The perf tool itself seems to have the same issue:
With CACHE & BRANCH counters does not work :
$ perf stat -e '{r0c4,r0c5,r0c0,r03c,r0300,r013c,r04F2E,r0412E}:SIu' sleep 1
Performance counter stats for 'sleep 1':
<not counted> r0c4
(0.00%)
<not counted> r0c5
(0.00%)
<not counted> r0c0
(0.00%)
<not counted> r03c
(0.00%)
<not counted> r0300
(0.00%)
<not counted> r013c
(0.00%)
<not counted> r04F2E
(0.00%)
<not supported> r0412E
1.001652932 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
Disabling the NMI watchdog makes no difference .
It is very strange that perf thinks 'r0412E' is not supported :
$ cat /sys/bus/event_source/devices/cpu/cache_misses
event=0x2e,umask=0x41
The kernel should not be advertizing an unsupported event
in a /sys/bus/event_source/devices/cpu/events/ file, should it ?
So perf stat has the same problem - without either Cache or Branch
counters seems to work fine:
without cache:
$ perf stat -e '{r0c4,r0c5,r0c0,r03c,r0300,r013c}:SIu' sleep 1
Performance counter stats for 'sleep 1':
37740 r0c4
3557 r0c5
188552 r0c0
311684 r03c
360963 r0300
12461 r013c
1.001508109 seconds time elapsed
without branch:
$ perf stat -e '{r0c0,r03c,r0300,r013c,r04F2E,r0412E}:SIu' sleep 1
Performance counter stats for 'sleep 1':
188554 r0c0
320242 r03c
452748 r0300
15633 r013c
4145 r04F2E
3022 r0412E
1.001810421 seconds time elapsed
proving again that perf's claim that 'r0412E' is not supported is bogus.
The Intel SDM's table 19-1 Architectural events, which ALL Intel CPUs
are meant to support, does include 'Event: 2EH | Umask: 4FH : LLC
Reference ' and 'Event: 2EH | Umask: 41H : LLC Miss' , as well as :
'Event : C4H | Umask: 00H : Branch Instructions Retired' and
'Event : C5H | Umask: 00H : Branch Misses Retired' .
So why can't perf count them all in the same group?
Please , can anyone enlighten me as to what is going on here ?
Why can't I count all of
( BRANCH_INSTRUCTIONS , BRANCH_MISSES ,
CACHE_REFERENCES, CACHE_MISSES
)
in the same Perf Event Group ?
Thanks in advance for any replies,
Best Regards,
Jason
View attachment "perf_bug.c" of type "text/x-csrc" (5785 bytes)
Powered by blists - more mailing lists