[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALyZvKy7HVvjhJ5VTKKicvkq98GbJhGfCiMXOHNYseiOGu1T=A@mail.gmail.com>
Date: Tue, 13 Feb 2018 18:23:35 +0000
From: Jason Vas Dias <jason.vas.dias@...il.com>
To: linux-kernel@...r.kernel.org
Subject: Re: perf Intel x86_64 : BUG: BRANCH_INSTRUCTIONS / BRANCH_MISSES
cannot be combined with CACHE_REFERENCES / CACHE_MISSES .
On 13/02/2018, Jason Vas Dias <jason.vas.dias@...il.com> wrote:
> Good day -
>
> I'd much appreciate some advice as to why, on my Intel x86_64
> ( DisplayFamily_DisplayModel : 06_3CH ), running either Linux 4.12.10,
> or Linux 3.10.0, any attempt to count all of :
> PERF_COUNT_HW_BRANCH_INSTRUCTIONS
> (or raw config 0xC4) , and
> PERF_COUNT_HW_BRANCH_MISSES
> (or raw config 0xC5), and
> combined with
> PERF_COUNT_HW_CACHE_REFERENCES
> (or raw config 0x4F2E ), and
> PERF_COUNT_HW_CACHE_MISSES
> (or raw config 0x412E) ,
> results in ALL COUNTERS BEING 0 in a read of the Group FD or
> mmap sample area.
>
> This is demonstrated by the example program, which will
> use perf_event_open() to create a Group Leader FD for the first event,
> and associate all other events with that Event Group , so that it
> will read all events on the group FD .
>
> The perf_event_open() calls and the ioctl(event_fd, PERF_EVENT_IOC_ID, &id)
> calls all return successfully , but if I combine ANY of
> ( PERF_COUNT_HW_BRANCH_INSTRUCTIONS,
> PERF_COUNT_HW_BRANCH_MISSES
> ) with any of
> ( PERF_COUNT_HW_CACHE_REFERENCES,
> PERF_COUNT_HW_CACHE_MISSES
> ) in the Event Group, ALL events have '0' event->value.
>
> Demo :
> 1. Compile program to use kernel mapped Generic Events:
> $ gcc -std=gnu11 -o perf_bug perf_bug.c
> Running program shows all counters have 0 values, since both
> CACHE & BRANCH hits+misses are being requested:
>
> $ ./perf_bug
> EVENT: Branch Instructions : 0
> EVENT: Branch Misses : 0
> EVENT: Instructions : 0
> EVENT: CPU Cycles : 0
> EVENT: Ref. CPU Cycles : 0
> EVENT: Bus Cycles : 0
> EVENT: Cache References : 0
> EVENT: Cache Misses : 0
>
> NOT registering interest in EITHER the BRANCH counters
> OR the CACHE counters fixes the problem:
>
> Compile without registering for BRANCH_INSTRUCTIONS
> or BRANCH_MISSES:
> $ gcc -std=gnu11 -DNO_BUG_NO_BRANCH -o perf_bug perf_bug.c
> $ ./perf_bug
> EVENT: Instructions : 914
> EVENT: CPU Cycles : 4110
> EVENT: Ref. CPU Cycles : 4437
> EVENT: Bus Cycles : 152
> EVENT: Cache References : 1
> EVENT: Cache Misses : 1
>
> Compile without registering for CACHE_REFERENCES or CACHE_MISSES:
> $ gcc -std=gnu11 -DNO_BUG_NO_CACHE -o perf_bug perf_bug.c
> $ ./perf_bug
> EVENT: Branch Instructions : 106
> EVENT: Branch Misses : 6
> EVENT: Instructions : 914
> EVENT: CPU Cycles : 4132
> EVENT: Ref. CPU Cycles : 8526
> EVENT: Bus Cycles : 295
>
> The same thing happens if I do not use Generic Events, but rather
> "dynamic raw PMU" events, by putting the hex values from
> /sys/bus/event_source/devices/cpu/events/? into the perf_event_attr
> config, OR'ed with (1<<63), and using the PERF_TYPE_RAW perf_event_attr
> type value :
>
> $ gcc -DUSE_RAW_PMU -o perf_bug perf_bug.c
> $ ./perf_bug
> EVENT: Branch Instructions : 0
> EVENT: Branch Misses : 0
> EVENT: Instructions : 0
> EVENT: CPU Cycles : 0
> EVENT: Ref. CPU Cycles : 0
> EVENT: Bus Cycles : 0
> EVENT: Cache References : 0
> EVENT: Cache Misses : 0
>
>
> $ gcc -DUSE_RAW_PMU -DNO_BUG_NO_BRANCH -o perf_bug perf_bug.c
> $ ./perf_bug
> EVENT: Instructions : 914
> EVENT: CPU Cycles : 4102
> EVENT: Ref. CPU Cycles : 4959
> EVENT: Bus Cycles : 171
> EVENT: Cache References : 2
> EVENT: Cache Misses : 2
>
> $ gcc -DUSE_RAW_PMU -DNO_BUG_NO_CACHE -o perf_bug perf_bug.c
> $ ./perf_bug
> EVENT: Branch Instructions : 106
> EVENT: Branch Misses : 6
> EVENT: Instructions : 914
> EVENT: CPU Cycles : 4108
> EVENT: Ref. CPU Cycles : 10817
> EVENT: Bus Cycles : 373
>
>
> The perf tool itself seems to have the same issue:
>
> With CACHE & BRANCH counters does not work :
> $ perf stat -e '{r0c4,r0c5,r0c0,r03c,r0300,r013c,r04F2E,r0412E}:SIu' sleep
> 1
>
> Performance counter stats for 'sleep 1':
>
> <not counted> r0c4
> (0.00%)
> <not counted> r0c5
> (0.00%)
> <not counted> r0c0
> (0.00%)
> <not counted> r03c
> (0.00%)
> <not counted> r0300
> (0.00%)
> <not counted> r013c
> (0.00%)
> <not counted> r04F2E
> (0.00%)
> <not supported> r0412E
>
> 1.001652932 seconds time elapsed
>
> Some events weren't counted. Try disabling the NMI watchdog:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
>
> Disabling the NMI watchdog makes no difference .
>
> It is very strange that perf thinks 'r0412E' is not supported :
> $ cat /sys/bus/event_source/devices/cpu/cache_misses
> event=0x2e,umask=0x41
>
> The kernel should not be advertizing an unsupported event
> in a /sys/bus/event_source/devices/cpu/events/ file, should it ?
>
> So perf stat has the same problem - without either Cache or Branch
> counters seems to work fine:
>
> without cache:
> $ perf stat -e '{r0c4,r0c5,r0c0,r03c,r0300,r013c}:SIu' sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 37740 r0c4
> 3557 r0c5
> 188552 r0c0
> 311684 r03c
> 360963 r0300
> 12461 r013c
>
> 1.001508109 seconds time elapsed
>
> without branch:
> $ perf stat -e '{r0c0,r03c,r0300,r013c,r04F2E,r0412E}:SIu' sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 188554 r0c0
> 320242 r03c
> 452748 r0300
> 15633 r013c
> 4145 r04F2E
> 3022 r0412E
>
> 1.001810421 seconds time elapsed
>
> proving again that perf's claim that 'r0412E' is not supported is bogus.
> The Intel SDM's table 19-1 Architectural events, which ALL Intel CPUs
> are meant to support, does include 'Event: 2EH | Umask: 4FH : LLC
> Reference ' and 'Event: 2EH | Umask: 41H : LLC Miss' , as well as :
> 'Event : C4H | Umask: 00H : Branch Instructions Retired' and
> 'Event : C5H | Umask: 00H : Branch Misses Retired' .
> So why can't perf count them all in the same group?
>
> Please , can anyone enlighten me as to what is going on here ?
>
> Why can't I count all of
> ( BRANCH_INSTRUCTIONS , BRANCH_MISSES ,
> CACHE_REFERENCES, CACHE_MISSES
> )
> in the same Perf Event Group ?
>
> Thanks in advance for any replies,
> Best Regards,
> Jason
>
Actually, it appears that ONLY the combination of
'BRANCH_MISSES' and 'CACHE_MISSES' makes
all sampled counter values 0 ; if either counter is
not requested, all other counters have non-zero values.
I've updated the program to reflect this .
And nmi_watchdog=1 DOES make a difference :
if nmi_watchdog is > 0 , then ANY combination
of {BRANCH,CACHE}_{REFS,MISSES} makes ALL
sampled counter values be 0, but if nmi_watchdog is 0,
then only the combination of BRANCH_MISSES and
CACHE_MISSES makes all sampled values be 0 .
This is a really nasty bug and makes the Linux PERF facility
rather unusable ; because of this bug, Linux PERF provides
no way of measuring cache and branch prediction performance
at the same time for the same instruction sequence.
Can anyone suggest a valid reason for this ?
Or any workarounds ?
If no-one suggests a workaround or valid reason
I guess I should raise this as a serious bug .
Thanks & Regards,
Jason
View attachment "perf_bug.c" of type "text/x-csrc" (6161 bytes)
Powered by blists - more mailing lists