linux-kernel - Re: Broken dwarf unwinding - wrong stack pointer register value?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1759553.TYpvCCURxk@agathebauer>
Date:   Sun, 21 Oct 2018 22:32:10 +0200
From:   Milian Wolff <milian.wolff@...b.com>
To:     Milian Wolff <mail@...ianw.de>
Cc:     linux-kernel@...r.kernel.org, Jiri Olsa <jolsa@...nel.org>,
        namhyung@...nel.org, linux-perf-users@...r.kernel.org,
        Arnaldo Carvalho <acme@...nel.org>
Subject: Re: Broken dwarf unwinding - wrong stack pointer register value?

On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> Hey all,
> 
> I'm on the quest to figure out why perf regularly fails to unwind (some)
> samples. I am seeing very strange behavior, where an apparently wrong stack
> pointer value is read from the register - see below for more information and
> the end of this (long) mail for my open questions. Any help would be
> greatly appreciated.
> 
> I am currently using this trivial C++ code to reproduce the issue:
> 
> ```
> #include <cmath>
> #include <complex>
> #include <iostream>
> #include <random>
> 
> using namespace std;
> 
> int main()
> {
>     uniform_real_distribution<double> uniform(-1E5, 1E5);
>     default_random_engine engine;
>     double s = 0;
>     for (int i = 0; i < 10000000; ++i) {
>         s += norm(complex<double>(uniform(engine), uniform(engine)));
>     }
>     cout << s << '\n';
>     return 0;
> }
> ```
> 
> I compile it with `g++ -O2 -g` and then record it with `perf record --call-
> graph dwarf`. Using perf script, I then see e.g.:

With my patch to regularly flush the perf script output buffer, we can now 
easily find all broken backtraces and the corresponding debug output via:

    $ perf script --ns -v |& awk -v RS='' '/\[unknown\]/ {print "\n"$0}'

I've pasted the output to the above command from my machine here:
https://paste.kde.org/pmyxwkk1k

This contains 139 samples with broken unwinding, out of 2350 samples in total, 
so about 6% of all samples are broken.

In many cases, the first accessed memory is 0 because a too-low offset into 
the stack is computed from the SP value, similar to the scenario I described 
in my initial mail. In other cases we read garbadge addresses such as 

unwind: access_mem addr 0x7ffc80811cf0 val 408195dfbda90580, offset 24

In all cases except for the the two samples at the very start and end of this 
log, the last offset encountered in access_mem is lower than 72. Remember what 
I wrote in the initial mail - if I manually hack the access_mem function to 
use 72 for one of the broken samples, it made unwinding magically work 
again...

With this addition of data - can anyone sched some light on what's potentially 
going on here? How can we improve this situation?

Thanks
-- 
Milian Wolff | milian.wolff@...b.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts
Download attachment "smime.p7s" of type "application/pkcs7-signature" (3826 bytes)