lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 17 Feb 2020 22:10:27 +0100
From:   Jann Horn <>
To:     Josh Poimboeuf <>
Cc:     Peter Zijlstra <>,
        "the arch/x86 maintainers" <>,
        kernel list <>,
        Ard Biesheuvel <>,
        Andy Lutomirski <>,
        Steven Rostedt <>,
        Ingo Molnar <>,
        Thomas Gleixner <>,
        Linus Torvalds <>,
        Masami Hiramatsu <>,
        Jason Baron <>, Jiri Kosina <>,
        David Laight <>,
        Borislav Petkov <>,
        Julia Cartwright <>, Jessica Yu <>,
        "H. Peter Anvin" <>, Nadav Amit <>,
        Rasmus Villemoes <>,
        Edward Cree <>,
        Daniel Bristot de Oliveira <>
Subject: Re: [PATCH v3 0/6] Static calls

On Thu, Jan 10, 2019 at 9:52 PM Josh Poimboeuf <> wrote:
> On Thu, Jan 10, 2019 at 09:30:23PM +0100, Peter Zijlstra wrote:
> > On Wed, Jan 09, 2019 at 04:59:35PM -0600, Josh Poimboeuf wrote:
> > > With this version, I stopped trying to use text_poke_bp(), and instead
> > > went with a different approach: if the call site destination doesn't
> > > cross a cacheline boundary, just do an atomic write.  Otherwise, keep
> > > using the trampoline indefinitely.
> >
> > > - Get rid of the use of text_poke_bp(), in favor of atomic writes.
> > >   Out-of-line calls will be promoted to inline only if the call sites
> > >   don't cross cache line boundaries. [Linus/Andy]
> >
> > Can we perserve why text_poke_bp() didn't work? I seem to have forgotten
> > again. The problem was poking the return address onto the stack from the
> > int3 handler, or something along those lines?
> Right, emulating a call instruction from the #BP handler is ugly,
> because you have to somehow grow the stack to make room for the return
> address.  Personally I liked the idea of shifting the iret frame by 16
> bytes in the #DB entry code, but others hated it.
> So many bad-but-not-completely-unacceptable options to choose from.

Silly suggestion from someone who has skimmed the thread:

Wouldn't a retpoline-style trampoline solve this without needing
memory allocations? Let the interrupt handler stash the destination in
a percpu variable and clear IF in regs->flags. Something like:

void simulate_call(unsigned long target) {
  __this_cpu_write(static_call_restore_if, (regs->flags & X86_EFLAGS_IF) != 0);
  regs->flags &= ~X86_EFLAGS_IF;
  __this_cpu_write(static_call_trampoline_source, regs->ip + 5);
  __this_cpu_write(static_call_trampoline_target, target);
  regs->ip = magic_static_call_trampoline;

; set up return address for returning from target function
pushl PER_CPU_VAR(static_call_trampoline_source)
; set up retpoline-style return address
pushl PER_CPU_VAR(static_call_trampoline_target)
; restore flags if needed
cmp PER_CPU_VAR(static_call_restore_if), 0
je 1f
sti ; NOTE: percpu data must not be accessed past this point
ret ; "return" to the call target

By using a return to implement the call, we don't need any scratch
registers for the call.

Powered by blists - more mailing lists