linux-kernel - Re: [PATCH v6 4/8] x86: speed up int3-based patching using direct write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140114194519.743261c4@gandalf.local.home>
Date:	Tue, 14 Jan 2014 19:45:19 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Petr Mladek <pmladek@...e.cz>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Jiri Kosina <jkosina@...e.cz>, linux-kernel@...r.kernel.org,
	x86@...nel.org
Subject: Re: [PATCH v6 4/8] x86: speed up int3-based patching using direct
 write

On Tue, 10 Dec 2013 16:42:16 +0100
Petr Mladek <pmladek@...e.cz> wrote:

> When trying to use text_poke_bp_iter in ftrace, the result was slower than
> the original implementation.
> 
> It turned out that one difference was in text_poke vs. ftrace_write.
> text_poke did many extra operations to make sure that code was read-write
> and the change was atomic.
> 
> In fact, we do not need the atomic operation inside text_poke_bp_iter because
> the modified code is guarded by the int3 handler. The int3 interrupt itself
> is added and removed by an atomic operation because we modify only one byte.
> 
> Also if we patch many functions, it is more effective to set the whole text code
> read-write instead of remapping each address into a helper address space.
> 
> This patch adds text_poke_part which is inspired by ftrace_write.
> Then it is used in text_poke_bp_iter instead of the slower text_poke.
> It adds the limitation that text_poke_bp_iter user is responsible for
> setting the patched code read-write.
> 
> Note that we can't use it in text_poke because it is not effective
> to set the page or two read-write just because patching one piece.
> 
> For example, I tried to switch between 7 tracers: blk, branch, function_graph,
> wakeup_rt, irqsoff, function, and nop. Every tracer has also been enabled and
> disabled. With 100 cycles, I got these times with text_poke:
> 
>     real    25m7.380s     25m13.44s     25m9.072s
>     user    0m0.004s      0m0.004s      0m0.004s
>     sys     0m3.480s      0m3.508s      0m3.420s
> 
> and with text_poke_part:
> 
>     real    3m23.335s     3m24.422s     3m26.686s
>     user    0m0.004s      0m0.004s	0m0.004s
>     sys     0m3.724s      0m3.772s	0m3.588s
> 
> Signed-off-by: Petr Mladek <pmladek@...e.cz>
> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
> ---
>  arch/x86/kernel/alternative.c | 67 ++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 57 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 8e57ac03a0e8..03abf9abf681 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -595,6 +595,51 @@ void __kprobes text_poke_or_die(void *addr, const void *opcode, size_t len)
>  	BUG_ON(err);
>  }
>  
> +/**
> + * text_poke_part - Update part of the instruction on a live kernel when using
> + * 		    int3 breakpoint
> + * @addr:   address to modify
> + * @opcode: source of the copy
> + * @len:    length to copy
> + *
> + * If we patch instructions using int3 interrupt, we do not need to be so
> + * careful about an atomic write. Adding and removing the interrupt is atomic
> + * because we modify only one byte. The rest of the instruction could be
> + * modified in several steps because it is guarded by the interrupt handler.

Another nit, because I'm anal. I know Intel confuses things by calling
int3 an interrupt, when in reality it is a trap. There's nothing
asynchronous about it and nothing is being interrupted. Can we use the
term "trap" or "breakpoint" instead of "interrupt"?

> + * Hence we could use faster code here. See also text_poke_bp_iter below
> + * for more details.
> + *
> + * Note: This function must be called under text_mutex. Also the caller is
> + * responsible for setting the patched code read-write. This is more effective
> + * only when you patch many addresses at the same time.

s/many/several/

> + */
> +static int text_poke_part(void *addr, const void *opcode, size_t len)
> +{
> +	int ret;
> +
> +	/* The patching makes sense only for a text code */

s/for a/for/

-- Steve

> +	if (unlikely((unsigned long)addr < (unsigned long)_text) ||
> +	    unlikely((unsigned long)addr >= (unsigned long)_etext))
> +		return -EFAULT;
> +
> +#if defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_RODATA)
> +	/*
> +	 * On x86_64, the kernel text mapping is always mapped read-only with
> +	 * CONFIG_DEBUG_RODATA. Instead, we need to use the kernel identity
> +	 * mapping that can be set read-write, for example using
> +	 * set_kernel_text_rw().
> +	 */
> +	addr = __va(__pa_symbol(addr));
> +#endif
> +
> +	ret = probe_kernel_write(addr, opcode, len);
> +	if (unlikely(ret))
> +		return -EPERM;
> +
> +	sync_core();
> +	return 0;
> +}
> +
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/