linux-kernel - Re: [PATCH v3 00/10] x86: ORC unwinder (previously undwarf)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170713121755.hsuvecrzvyxbdvvk@treble>
Date:   Thu, 13 Jul 2017 07:17:55 -0500
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Andres Freund <andres@...razel.de>, x86@...nel.org,
        linux-kernel@...r.kernel.org, live-patching@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>, Jiri Slaby <jslaby@...e.cz>,
        "H. Peter Anvin" <hpa@...or.com>, Mike Galbraith <efault@....de>,
        Jiri Olsa <jolsa@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...radead.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>
Subject: Re: [PATCH v3 00/10] x86: ORC unwinder (previously undwarf)

On Thu, Jul 13, 2017 at 11:19:11AM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@...radead.org> wrote:
> 
> > > One gloriously ugly hack would be to delay the userspace unwind to 
> > > return-to-userspace, at which point we have a schedulable context and can take 
> > > faults.
> 
> I don't think it's ugly, and it has various advantages:
> 
> > > Of course, then you have to somehow identify this later unwind sample with all 
> > > relevant prior samples and stitch the whole thing back together, but that 
> > > should be doable.
> > > 
> > > In fact, it would not be at all hard to do, just queue a task_work from the 
> > > NMI and have that do the EH based unwind.
> 
> This would have a couple of advantages:
> 
>  - as you mention, being able to fault in debug info and generally do 
>    IO/scheduling,
> 
>  - profiling overhead would be accounted to the task context that generates it,
>    not the NMI context,
> 
>  - there would be a natural batching/coalescing optimization if multiple events
>    hit the same system call: the user-space backtrace would only have to be looked 
>    up once for all samples that got collected.
> 
> This could be done by separating the user-space backtrace into a separate event, 
> and perf tooling would then apply the same user-space backtrace to all prior 
> kernel samples.
> 
> I.e. the ring-buffer would have trace entries like:
> 
>  [ kernel sample #1, with kernel backtrace #1 ]
>  [ kernel sample #2, with kernel backtrace #2 ]
>  [ kernel sample #3, with kernel backtrace #3 ]
>  [ user-space backtrace #1 at syscall return ]
>  ...
> 
> Note how the three kernel samples didn't have to do any user-space unwinding at 
> all, so the user-space unwinding overhead got reduced by a factor of 3.
> 
> Tooling would know that 'user-space backtrace #1' applies to the previous three 
> kernel samples.
> 
> Or so?

BTW, while we're throwing out ideas for this, here's another idea,
though it's almost certainly not a good one :-)

For user space stack unwinding, the kernel could emulate what the kernel
'guess' unwinder does by scanning the user space stack and returning all
the text addresses it finds.

The results wouldn't be 100% accurate, but they could end up being
useful over time.

-- 
Josh