linux-kernel - Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181112032637.GG6218@tassilo.jf.intel.com>
Date:   Sun, 11 Nov 2018 19:26:37 -0800
From:   Andi Kleen <ak@...ux.intel.com>
To:     Travis Downs <travis.downs@...il.com>
Cc:     Milian Wolff <milian.wolff@...b.com>, jolsa@...hat.com,
        linux-kernel@...r.kernel.org, jolsa@...nel.org,
        namhyung@...nel.org, linux-perf-users@...r.kernel.org,
        acme@...nel.org
Subject: Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf
 unwinding - wrong stack pointer register value?]

On Sat, Nov 10, 2018 at 09:50:05PM -0500, Travis Downs wrote:
>    On Sat, Nov 10, 2018 at 8:07 PM Andi Kleen <ak@...ux.intel.com> wrote:
> 
>      On Sat, Nov 10, 2018 at 04:42:48PM -0500, Travis Downs wrote:
>      > I guess this problem doesn't occur for LBR unwinding since the LBR
>      > records are captured at the same
>      > moment in time as the PEBS record, so reflect the correct branch
>      > sequence.
> 
>      Actually it happens with LBRs too, but it always gives the backtrace
>      consistently at the PMI trigger point.
> 
>    That's weird - so the LBR records are from the PMI point, but the rest of
>    the PEBS record comes from the PEBS trigger point? Or the LBR isn't part
>    of PEBS at all?

LBR is not part of PEBS, but is collected separately in the PMI handler.

>      > overhead calculations will be based on the captured stacks, I guess -
>      > but when I annotate, will the values I see correspond to the PEBS IPs
>      > or the PMI IPs?
> 
>      Based on PEBS IPs.
> 
>      It would be a good idea to add a check to perf report
>      that the two IPs are different, and if they differ
>      add some indicator to the sample. This could be a new sort key,
>      although that would waste some space on the screen, or something
>      else.
> 
>    In the case that PEBS events are used, the IP will differ essentially 100%
>    of the time, right? That is, there will always be *some* skid.

I wouldn't say that.  It depends on what the CPU is doing and the IPC
of the code.

Also the backtrace inconsistency can only happen if the sample races with
function return. If you don't then the backtrace will point
to the correct function, even though the unwind IP is different. 

For example in the common case where you profile a long loop it
is unlikely to happen.


>    indicating otherwise above), I could imagine a hybrid mode where LBR is
>    used to go back some number of calls and then dwarf or FP or whatever
>    unwinding takes over, because the further down the stack you do the more
>    likely the PEBS trigger point and PMI point are likely to have a
>    consistent stack.

Could collect numbers how often it happens, but it would surprise
me if anything complicated is worth it. I would just do the minimum fixes
to address the unwinder errors, and perhaps add the "unwind ip differs"
indication.

-Andi