lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALyZvKy7HVvjhJ5VTKKicvkq98GbJhGfCiMXOHNYseiOGu1T=A@mail.gmail.com>
Date:   Tue, 13 Feb 2018 18:23:35 +0000
From:   Jason Vas Dias <jason.vas.dias@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: Re: perf Intel x86_64 : BUG: BRANCH_INSTRUCTIONS / BRANCH_MISSES
 cannot be combined with CACHE_REFERENCES / CACHE_MISSES .

On 13/02/2018, Jason Vas Dias <jason.vas.dias@...il.com> wrote:
> Good day -
>
> I'd much appreciate some advice as to why, on my Intel x86_64
> ( DisplayFamily_DisplayModel : 06_3CH ), running either Linux 4.12.10,
> or Linux 3.10.0, any attempt to count all of :
>      PERF_COUNT_HW_BRANCH_INSTRUCTIONS
>           (or raw config 0xC4) , and
>      PERF_COUNT_HW_BRANCH_MISSES
>           (or raw config 0xC5), and
>      combined with
>      PERF_COUNT_HW_CACHE_REFERENCES
>          (or raw config 0x4F2E ), and
>      PERF_COUNT_HW_CACHE_MISSES
>          (or raw config 0x412E) ,
> results in ALL COUNTERS BEING 0 in a read of the Group FD or
> mmap sample area.
>
> This is demonstrated by the example program, which will
> use perf_event_open() to create a Group Leader FD  for the first event,
> and associate all other events with that Event Group , so that it
> will read all events on the group FD .
>
> The perf_event_open() calls and the ioctl(event_fd, PERF_EVENT_IOC_ID, &id)
> calls all return successfully , but if I combine ANY of
> ( PERF_COUNT_HW_BRANCH_INSTRUCTIONS,
>   PERF_COUNT_HW_BRANCH_MISSES
> ) with any of
> ( PERF_COUNT_HW_CACHE_REFERENCES,
>   PERF_COUNT_HW_CACHE_MISSES
> ) in the Event Group, ALL events have '0' event->value.
>
> Demo :
> 1. Compile program to use kernel mapped Generic Events:
>   $ gcc -std=gnu11 -o perf_bug perf_bug.c
>   Running program shows all counters have 0 values, since both
>   CACHE & BRANCH hits+misses are being requested:
>
>   $ ./perf_bug
>   EVENT: Branch Instructions : 0
>   EVENT: Branch Misses : 0
>   EVENT: Instructions : 0
>   EVENT: CPU Cycles : 0
>   EVENT: Ref. CPU Cycles : 0
>   EVENT: Bus Cycles : 0
>   EVENT: Cache References : 0
>   EVENT: Cache Misses : 0
>
>   NOT registering interest in EITHER the BRANCH counters
>   OR the CACHE counters fixes the problem:
>
>   Compile without registering for BRANCH_INSTRUCTIONS
>   or BRANCH_MISSES:
>   $ gcc -std=gnu11 -DNO_BUG_NO_BRANCH  -o perf_bug perf_bug.c
>   $ ./perf_bug
>   EVENT: Instructions : 914
>   EVENT: CPU Cycles : 4110
>   EVENT: Ref. CPU Cycles : 4437
>   EVENT: Bus Cycles : 152
>   EVENT: Cache References : 1
>   EVENT: Cache Misses : 1
>
>   Compile without registering for CACHE_REFERENCES or CACHE_MISSES:
>   $ gcc -std=gnu11 -DNO_BUG_NO_CACHE  -o perf_bug perf_bug.c
>   $ ./perf_bug
> EVENT: Branch Instructions : 106
> EVENT: Branch Misses : 6
> EVENT: Instructions : 914
> EVENT: CPU Cycles : 4132
> EVENT: Ref. CPU Cycles : 8526
> EVENT: Bus Cycles : 295
>
> The same thing happens if I do not use Generic Events, but rather
> "dynamic raw PMU" events, by putting the hex values from
> /sys/bus/event_source/devices/cpu/events/? into the perf_event_attr
> config, OR'ed with (1<<63), and using the PERF_TYPE_RAW perf_event_attr
> type value :
>
> $ gcc -DUSE_RAW_PMU -o perf_bug perf_bug.c
> $ ./perf_bug
> EVENT: Branch Instructions : 0
> EVENT: Branch Misses : 0
> EVENT: Instructions : 0
> EVENT: CPU Cycles : 0
> EVENT: Ref. CPU Cycles : 0
> EVENT: Bus Cycles : 0
> EVENT: Cache References : 0
> EVENT: Cache Misses : 0
>
>
> $ gcc -DUSE_RAW_PMU -DNO_BUG_NO_BRANCH -o perf_bug perf_bug.c
> $ ./perf_bug
> EVENT: Instructions : 914
> EVENT: CPU Cycles : 4102
> EVENT: Ref. CPU Cycles : 4959
> EVENT: Bus Cycles : 171
> EVENT: Cache References : 2
> EVENT: Cache Misses : 2
>
> $ gcc -DUSE_RAW_PMU -DNO_BUG_NO_CACHE -o perf_bug perf_bug.c
> $ ./perf_bug
> EVENT: Branch Instructions : 106
> EVENT: Branch Misses : 6
> EVENT: Instructions : 914
> EVENT: CPU Cycles : 4108
> EVENT: Ref. CPU Cycles : 10817
> EVENT: Bus Cycles : 373
>
>
> The perf tool itself seems to have the same issue:
>
> With CACHE & BRANCH counters does not work :
> $ perf stat -e '{r0c4,r0c5,r0c0,r03c,r0300,r013c,r04F2E,r0412E}:SIu' sleep
> 1
>
>  Performance counter stats for 'sleep 1':
>
>      <not counted>      r0c4
>                (0.00%)
>      <not counted>      r0c5
>                (0.00%)
>      <not counted>      r0c0
>                (0.00%)
>      <not counted>      r03c
>                (0.00%)
>      <not counted>      r0300
>                (0.00%)
>      <not counted>      r013c
>                (0.00%)
>      <not counted>      r04F2E
>                (0.00%)
>    <not supported>     r0412E
>
>        1.001652932 seconds time elapsed
>
>    Some events weren't counted. Try disabling the NMI watchdog:
> 	echo 0 > /proc/sys/kernel/nmi_watchdog
> 	perf stat ...
> 	echo 1 > /proc/sys/kernel/nmi_watchdog
>
> Disabling the NMI watchdog makes no difference .
>
> It is very strange that perf thinks 'r0412E' is not supported :
>    $ cat /sys/bus/event_source/devices/cpu/cache_misses
>    event=0x2e,umask=0x41
>
> The kernel should not be advertizing an unsupported event
> in a  /sys/bus/event_source/devices/cpu/events/ file, should it ?
>
> So perf stat has the same problem - without either Cache or Branch
> counters seems to work fine:
>
> without cache:
> $ perf stat -e '{r0c4,r0c5,r0c0,r03c,r0300,r013c}:SIu' sleep 1
>
>  Performance counter stats for 'sleep 1':
>
>              37740      r0c4
>               3557      r0c5
>             188552      r0c0
>             311684      r03c
>             360963      r0300
>              12461      r013c
>
>        1.001508109 seconds time elapsed
>
> without branch:
> $ perf stat -e '{r0c0,r03c,r0300,r013c,r04F2E,r0412E}:SIu' sleep 1
>
>  Performance counter stats for 'sleep 1':
>
>             188554      r0c0
>             320242      r03c
>             452748      r0300
>              15633      r013c
>               4145      r04F2E
>               3022      r0412E
>
>        1.001810421 seconds time elapsed
>
> proving again that perf's claim that 'r0412E' is not supported is bogus.
> The Intel SDM's table 19-1 Architectural events, which ALL Intel CPUs
> are meant to support, does include  'Event: 2EH | Umask: 4FH : LLC
> Reference '  and  'Event: 2EH | Umask: 41H : LLC Miss' , as well as :
> 'Event : C4H | Umask: 00H : Branch Instructions Retired' and
> 'Event : C5H | Umask: 00H : Branch Misses Retired' .
> So why can't perf count them all in the same group?
>
> Please , can anyone enlighten me as to what is going on here ?
>
> Why can't I count all of
>    ( BRANCH_INSTRUCTIONS , BRANCH_MISSES ,
>      CACHE_REFERENCES, CACHE_MISSES
>   )
> in the same Perf Event Group ?
>
> Thanks in advance for any replies,
> Best Regards,
> Jason
>

Actually, it appears that ONLY the combination of
'BRANCH_MISSES' and 'CACHE_MISSES' makes
all sampled counter values 0 ; if either counter is
not requested, all other counters have non-zero values.
 I've updated the program to reflect this .

And nmi_watchdog=1 DOES make a difference  :
if  nmi_watchdog is > 0 , then ANY combination
of {BRANCH,CACHE}_{REFS,MISSES} makes ALL
sampled counter values be 0, but if nmi_watchdog is 0,
then only the combination of BRANCH_MISSES and
CACHE_MISSES makes all sampled values be 0 .

This is a really nasty bug and makes the Linux PERF facility
rather unusable ; because of this bug, Linux PERF provides
no way of measuring cache and branch prediction performance
at the same time for the same instruction sequence.

Can anyone suggest a valid reason for this ?
Or any workarounds ?

If no-one suggests a workaround or valid reason
I guess I should raise this as a serious bug .

Thanks & Regards,
Jason

View attachment "perf_bug.c" of type "text/x-csrc" (6161 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ