linux-kernel - Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <2027973.spD7XaVQdP@milian-kdab2>
Date:   Tue, 05 Sep 2017 14:26:38 +0200
From:   Milian Wolff <milian.wolff@...b.com>
To:     Andi Kleen <andi@...stfloor.org>
Cc:     linux-perf-users@...r.kernel.org,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        linux-kernel@...r.kernel.org, peterz@...radead.org
Subject: Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]

On Tuesday, September 5, 2017 5:40:58 AM CEST Andi Kleen wrote:
> > The cycle value gets associated with a sample via it's period value, which
> > is used by `perf report` in the analysis. If I get a single "broken"
> > sample with
>
> I always thought it just used the number of samples?

No, that is not the case. It uses the cycle period as "weight" by default. 
Note also the recent patch set for `perf annotate` by Taeung Song which adds 
the functionality to switch between sample or period fractions. I'm actually 
not sure whether that exists for `perf report` yet.

In some situations where the cycle period values associated with the samples 
are correct, I actually also saw how this is useful: I have seen perf data 
files, where tons of samples got recorded around syscall entry/exit, but most 
samples only had tiny cycle values associated with them. If one only looks at 
number of samples, then it would look like the syscalls are expensive. While 
in reality, way more cycles are executed in userspace. This issue was/is 
apparent when looking at `perf report` values vs. the FlameGraph visualization 
created by the normal `stackcollapse-perf.pl` script. The latter does only 
look at the number of samples, the former takes the sample period value. 
Hotspot also used the former approach, but then I adapted perf's behavior.

> > a cycle count of, say 1E14 and then a million other samples, each with
> > "sane" cycle counts of let's say 1E5, then the one broken sample will
> > hold 50% of the total amount of measured cycles. So perf report will show
> > that the function where the broken sample points to will have a cost of
> > 50%.
> 
> I don't think I've seen such a situation. Did you?

I have seen this situation. This is what this current revival of this thread 
is all about. Without such issues, I wouldn't claim it's such a serious issue.

> BTW I'm not arguing against fixing it, but typically I just
> recommend to avoid frequency mode. The fast sampling at the beginning
> has caused a range of low level PMU bugs and it is hard to reason about
> because of its complex behavior. Also it has no protection against
> synchronizing with repeated patterns in the execution, which
> can cause bad shadowing effects.  If you use the Intel
> event aliases they have all sensible periods set by default.

I think we both agree that the frequency mode as-is should not be the default. 
But it is, and this is a serious issue in my opinion. We will need to find a 
sensible default for the event period and use that mode by default. I nowadays 
always add something like `-c 3000000` which gives me roughly 1k samples per 
second on a ~3GHz machine. It's just a ballpark figure and of course gets 
influenced by frequency scaling, but it's good enough for me. We could use a 
similar approach to find a period based on the max CPU clock rate 
automatically. But of course that would only work for cycles, and not for 
instructions or any of the other fancy event counters. But since the frequency 
mode is probably similarly broken there, it should not be the default. Better 
ask the user for explicit values rather than doing something automatically 
which can lead to broken results.

Cheers

-- 
Milian Wolff | milian.wolff@...b.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
Download attachment "smime.p7s" of type "application/pkcs7-signature" (3826 bytes)