lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 11 Jan 2019 21:56:05 +0000
From:   Nadav Amit <>
To:     Josh Poimboeuf <>
CC:     Linus Torvalds <>,
        Andy Lutomirski <>,
        Peter Zijlstra <>,
        the arch/x86 maintainers <>,
        Linux List Kernel Mailing <>,
        Ard Biesheuvel <>,
        Steven Rostedt <>,
        Ingo Molnar <>,
        Thomas Gleixner <>,
        Masami Hiramatsu <>,
        Jason Baron <>, Jiri Kosina <>,
        David Laight <>,
        Borislav Petkov <>,
        Julia Cartwright <>, Jessica Yu <>,
        "H. Peter Anvin" <>,
        Rasmus Villemoes <>,
        Edward Cree <>,
        Daniel Bristot de Oliveira <>
Subject: Re: [PATCH v3 0/6] Static calls

> On Jan 11, 2019, at 1:41 PM, Josh Poimboeuf <> wrote:
> On Fri, Jan 11, 2019 at 09:36:59PM +0000, Nadav Amit wrote:
>>> On Jan 11, 2019, at 1:22 PM, Josh Poimboeuf <> wrote:
>>> On Fri, Jan 11, 2019 at 12:46:39PM -0800, Linus Torvalds wrote:
>>>> On Fri, Jan 11, 2019 at 12:31 PM Josh Poimboeuf <> wrote:
>>>>> I was referring to the fact that a single static call key update will
>>>>> usually result in patching multiple call sites.  But you're right, it's
>>>>> only 1-2 trampolines per text_poke_bp() invocation.  Though eventually
>>>>> we may want to batch all the writes like what Daniel has proposed for
>>>>> jump labels, to reduce IPIs.
>>>> Yeah, my suggestion doesn't allow for batching, since it would
>>>> basically generate one trampoline for every rewritten instruction.
>>> As Andy said, I think batching would still be possible, it's just that
>>> we'd have to create multiple trampolines at a time.
>>> Or... we could do a hybrid approach: create a single custom trampoline
>>> which has the call destination patched in, but put the return address in
>>> %rax -- which is always clobbered, even for callee-saved PV ops.  Like:
>>> trampoline:
>>> 	push %rax
>>> 	call patched-dest
>>> That way the batching could be done with a single trampoline
>>> (particularly if using rcu-sched to avoid the sti hack).
>> I don’t see RCU-sched solves the problem if you don’t disable preemption. On
>> a fully preemptable kernel, you can get preempted between the push and the
>> call (jmp) or before the push. RCU-sched can then finish, and the preempted
>> task may later jump to a wrong patched-dest.
> Argh, I misspoke about RCU-sched.  Words are hard.
> I meant synchronize_rcu_tasks(), which is a completely different animal.
> My understanding is that it waits until all runnable tasks (including
> preempted tasks) have gotten a chance to run.

Actually, I just used the term you used, and thought about
synchronize_sched(). If you look at my patch [1], you’ll see I did something
similar using synchronize_sched(). But this required some delicate work of
restarting any preempted “optpoline” (or whatever name you want) block. 

[Note that my implementation has a terrible bug in this respect].

This is required since running a preempted task to does now prevent it from
being preempted again without doing any “real” progress.

If we want to adapt the same solution to static_calls, this means that in
retint_kernel (entry_64.S), you need check whether you got preempted inside
the trampoline and change the saved RIP in such case back, before the

IMHO, sti+jmp is simpler.


Powered by blists - more mailing lists