linux-kernel - Re: [PATCH RFC 1/6] perf/x86: Add perf text poke event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191111172935.GA11972@willie-the-truck>
Date:   Mon, 11 Nov 2019 17:29:35 +0000
From:   Will Deacon <will@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Leo Yan <leo.yan@...aro.org>, Mark Rutland <mark.rutland@....com>,
        Mike Leach <mike.leach@...aro.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H . Peter Anvin" <hpa@...or.com>, x86@...nel.org,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Mathieu Poirier <mathieu.poirier@...aro.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 1/6] perf/x86: Add perf text poke event

On Mon, Nov 11, 2019 at 05:05:05PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 11, 2019 at 03:39:25PM +0000, Will Deacon wrote:
> 
> > Backing up though, I think I'm missing details about what this thread is
> > trying to achieve. You're adding perf events so that coresight trace can
> > take into account modifications of the kernel text, right?
> 
> Yes, because ARM-CS / Intel-PT need to know the exact text at any one
> time in order to correctly decode their traces.
> 
> > If so:
> >
> >   * Does this need to take into account more than just jump_label()?
> 
> jump_label seems to be the initial target Adrian set, but yes, it needs
> to cover 'everything'.

Including alternatives, which are what get me worried since the potential
for recursion is pretty high there (on arm64, at least).

> That is, all single instruction patching crud around:
> 
>  - optimized kprobes
>    (regular kprobes are exempt because they rely on exceptions)
>  - jump_labels
>  - static_call (still pending but still)
>  - ftrace fentry
> 
> We also need a solution for whole new chunks of text:
> 
>  - modules
>  - ftrace trampolines
>  - optprobe trampolines
>  - JIT stuff
> 
> but that is so far not included; I had some ideas about a /dev/mem based
> interface that would report new ranges and wait for acks (from open
> file-desc) before freeing them.

I think that it would be nice not to end up with two very different
interfaces for this. But I see this is still just an RFC, so maybe the
full picture will emerge once we solve more of these use-cases.

> >   * What consistency guarantees are needed by userspace? It's not clear to
> >     me how it correlates the new event with execution on other CPUs. Is this
> >     using timestamps or something else?
> 
> Something else :-)
> 
> Both CS/PT basically have a bit-stream encoding of taken / not-taken
> decisions. To decode they read the text until a conditional
> branch-point, then consume one bit from the stream and continue.
> 
> (note how unconditional branches -- jump_labels -- are expected to be
> correct in this scheme)
> 
> This means they need an exact representation of the text to be able to
> accurately decode.
> 
> This means timestamps or PRE/POST modify events are not acceptible.
> Because until the I-FLUSH has completed we do not know which CPU has
> which version of the instruction.
> 
> Instead we rely on exceptions; exceptions are differently encoded in the
> CS/PT data streams.
> 
> The scheme used is:
> 
>  - overwrite target instruction with an exception (INT3 on x86, BRK on arm)
>  - sync (IPI broadcast CPUID or I-FLUSH completion)

Hmm. Wouldn't this sync also need to drain the trace buffers for all other
CPUs so that we make sure that the upcoming TEXT_POKE event occurs after
all prior trace data, which could've been from before the breakpoint was
installed?

> at this point we know the instruction _will_ trap and CS/PT can observe
> this alternate flow. That is, the exception handler will emulate the
> instruction.
> 
>  - emit the TEXT_POKE event with both the old and new instruction
>    included
> 
>  - overwrite the target instruction with the new instruction
>  - sync
> 
> at this point the new instruction should be valid.
> 
> Using this scheme we can at all times follow the instruction flow.
> Either it is an exception and the exception encoding helps us navigate,
> or, on either size, we'll know the old/new instruction.
> 
> >   * What about module loading?
> 
> I mentioned that above; that is still pending.
> 
> > Finally, the whole point of the '_nosync()' stuff is that we don't have
> > to synchronise. Therefore, there is a window where you really can't tell
> > whether the instruction has been updated from the perspective of another
> > CPU.
> 
> Right, and much of that is preserved with the above scheme, nowhere do
> you have to wait/disturb other CPUs, except for the I-FLUSH completion.
> 
> Does this help?

Yes, thank you for explaining it!

Will