[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez0owFet0E43UAGd7sV9Oi0yhVpWTmy4W+Vm5+0q=74-DA@mail.gmail.com>
Date: Mon, 17 Feb 2020 22:10:27 +0100
From: Jann Horn <jannh@...gle.com>
To: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
kernel list <linux-kernel@...r.kernel.org>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
Andy Lutomirski <luto@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Jason Baron <jbaron@...mai.com>, Jiri Kosina <jkosina@...e.cz>,
David Laight <David.Laight@...lab.com>,
Borislav Petkov <bp@...en8.de>,
Julia Cartwright <julia@...com>, Jessica Yu <jeyu@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>, Nadav Amit <namit@...are.com>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
Edward Cree <ecree@...arflare.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v3 0/6] Static calls
On Thu, Jan 10, 2019 at 9:52 PM Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> On Thu, Jan 10, 2019 at 09:30:23PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 09, 2019 at 04:59:35PM -0600, Josh Poimboeuf wrote:
> > > With this version, I stopped trying to use text_poke_bp(), and instead
> > > went with a different approach: if the call site destination doesn't
> > > cross a cacheline boundary, just do an atomic write. Otherwise, keep
> > > using the trampoline indefinitely.
> >
> > > - Get rid of the use of text_poke_bp(), in favor of atomic writes.
> > > Out-of-line calls will be promoted to inline only if the call sites
> > > don't cross cache line boundaries. [Linus/Andy]
> >
> > Can we perserve why text_poke_bp() didn't work? I seem to have forgotten
> > again. The problem was poking the return address onto the stack from the
> > int3 handler, or something along those lines?
>
> Right, emulating a call instruction from the #BP handler is ugly,
> because you have to somehow grow the stack to make room for the return
> address. Personally I liked the idea of shifting the iret frame by 16
> bytes in the #DB entry code, but others hated it.
>
> So many bad-but-not-completely-unacceptable options to choose from.
Silly suggestion from someone who has skimmed the thread:
Wouldn't a retpoline-style trampoline solve this without needing
memory allocations? Let the interrupt handler stash the destination in
a percpu variable and clear IF in regs->flags. Something like:
void simulate_call(unsigned long target) {
__this_cpu_write(static_call_restore_if, (regs->flags & X86_EFLAGS_IF) != 0);
regs->flags &= ~X86_EFLAGS_IF;
__this_cpu_write(static_call_trampoline_source, regs->ip + 5);
__this_cpu_write(static_call_trampoline_target, target);
regs->ip = magic_static_call_trampoline;
}
magic_static_call_trampoline:
; set up return address for returning from target function
pushl PER_CPU_VAR(static_call_trampoline_source)
; set up retpoline-style return address
pushl PER_CPU_VAR(static_call_trampoline_target)
; restore flags if needed
cmp PER_CPU_VAR(static_call_restore_if), 0
je 1f
sti ; NOTE: percpu data must not be accessed past this point
1:
ret ; "return" to the call target
By using a return to implement the call, we don't need any scratch
registers for the call.
Powered by blists - more mailing lists