lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51DF6C8D.1080903@hitachi.com>
Date:	Fri, 12 Jul 2013 11:40:13 +0900
From:	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
To:	Jiri Kosina <jkosina@...e.cz>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	Jason Baron <jbaron@...mai.com>,
	"H. Peter Anvin" <hpa@...ux.intel.com>,
	Borislav Petkov <bp@...en8.de>, Joe Perches <joe@...ches.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2 v3] x86: introduce int3-based instruction patching

(2013/07/12 5:26), Jiri Kosina wrote:
> Introduce a method for run-time instrucntion patching on a live SMP kernel 
> based on int3 breakpoint, completely avoiding the need for stop_machine().
> 
> The way this is achieved:
> 
> 	- add a int3 trap to the address that will be patched
> 	- sync cores
> 	- update all but the first byte of the patched range
> 	- sync cores
> 	- replace the first byte (int3) by the first byte of
> 	  replacing opcode
> 	- sync cores
> 
> According to
> 
> 	http://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html
> 
> synchronization after replacing "all but first" instructions should not
> be necessary (on Intel hardware), as the syncing after the subsequent
> patching of the first byte provides enough safety.
> But there's not only Intel HW out there, and we'd rather be on a safe
> side.
> 
> If any CPU instruction execution would collide with the patching,
> it'd be trapped by the int3 breakpoint and redirected to the provided
> "handler" (which would typically mean just skipping over the patched
> region, acting as "nop" has been there, in case we are doing nop -> jump
> and jump -> nop transitions).
> 
> Ftrace has been using this very technique since 08d636b ("ftrace/x86:
> Have arch x86_64 use breakpoints instead of stop machine") for ages
> already, and jump labels are another obvious potential user of this.
> 
> Based on activities of Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
> a few years ago.
> 
> Signed-off-by: Jiri Kosina <jkosina@...e.cz>

Looks good for me. I'd like to use this for optprobe too :)

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>

Thank you!

> ---
> Changes:
> 
> v1 -> v2:
>   + fixed kerneldoc
>   + fixed checkpatch errors (reported by Borislav)
> 
> v2 -> v3:
>   + fixed few typos (Joe Perches)
>   + extended changelog with pointer to Intel's statement regarding minimal 
>     necessary synchronization points to achieve correctness
>   + added even the synchronization that might not be needed according to 
>     Intel, to be on a safe side (and do what has been proven to work by 
>     ftrace implementation already) (Steven Rostedt)
>   + Fix formatting of comments (Steven Rostedt, Borislav Petkov, Joe 
>     Perches)
> 
>  arch/x86/include/asm/alternative.h |    1 +
>  arch/x86/kernel/alternative.c      |  107 ++++++++++++++++++++++++++++++++++++
>  kernel/kprobes.c                   |    2 +-
>  3 files changed, 109 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 58ed6d9..3abf8dd 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -233,6 +233,7 @@ struct text_poke_param {
>  };
>  
>  extern void *text_poke(void *addr, const void *opcode, size_t len);
> +extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
>  extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
>  extern void text_poke_smp_batch(struct text_poke_param *params, int n);
>  
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index c15cf9a..ed2377d 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -11,6 +11,7 @@
>  #include <linux/memory.h>
>  #include <linux/stop_machine.h>
>  #include <linux/slab.h>
> +#include <linux/kdebug.h>
>  #include <asm/alternative.h>
>  #include <asm/sections.h>
>  #include <asm/pgtable.h>
> @@ -596,6 +597,112 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
>  	return addr;
>  }
>  
> +static void do_sync_core(void *info)
> +{
> +	sync_core();
> +}
> +
> +static bool bp_patching_in_progress;
> +static void *bp_int3_handler, *bp_int3_addr;
> +
> +static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
> +{
> +	struct die_args *args = data;
> +
> +	/* bp_patching_in_progress */
> +	smp_rmb();
> +
> +	if (likely(!bp_patching_in_progress))
> +		return NOTIFY_DONE;
> +
> +	/* we are not interested in non-int3 faults and ring > 0 faults */
> +	if (val != DIE_INT3 || !args->regs || user_mode_vm(args->regs)
> +			    || (unsigned long) bp_int3_addr != args->regs->ip)
> +		return NOTIFY_DONE;
> +
> +	/* set up the specified breakpoint handler */
> +	args->regs->ip = (unsigned long) bp_int3_handler;
> +
> +	return NOTIFY_STOP;
> +}
> +/*
> + * text_poke_bp() -- update instructions on live kernel on SMP
> + * @addr:	address to patch
> + * @opcode:	opcode of new instruction
> + * @len:	length to copy
> + * @handler:	address to jump to when the temporary breakpoint is hit
> + *
> +
> + * Modify multi-byte instruction by using int3 breakpoint on SMP.
> + * In contrary to text_poke_smp(), we completely avoid stop_machine() here,
> + * and achieve the synchronization using int3 breakpoint.
> + *
> + * The way it is done:
> + *	- add a int3 trap to the address that will be patched
> + *	- sync cores
> + *	- update all but the first byte of the patched range
> + *	- sync cores
> + *	- replace the first byte (int3) by the first byte of
> + *	  replacing opcode
> + *	- sync cores
> + *
> + * Note: must be called under text_mutex.
> + */
> +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> +{
> +	unsigned char int3 = 0xcc;
> +
> +	bp_int3_handler = handler;
> +	bp_int3_addr = (u8 *)addr + sizeof(int3);
> +	bp_patching_in_progress = true;
> +	/*
> +	 * Corresponding read barrier in int3 notifier for
> +	 * making sure the in_progress flags is correctly ordered wrt.
> +	 * patching
> +	 */
> +	smp_wmb();
> +
> +	text_poke(addr, &int3, sizeof(int3));
> +
> +	on_each_cpu(do_sync_core, NULL, 1);
> +
> +	if (len - sizeof(int3) > 0) {
> +		/* patch all but the first byte */
> +		text_poke((char *)addr + sizeof(int3),
> +			  (const char *) opcode + sizeof(int3),
> +			  len - sizeof(int3));
> +		/*
> +		 * According to Intel, this core syncing is very likely
> +		 * not necessary and we'd be safe even without it. But
> +		 * better safe than sorry (plus there's not only Intel).
> +		 */
> +		on_each_cpu(do_sync_core, NULL, 1);
> +	}
> +
> +	/* patch the first byte */
> +	text_poke(addr, opcode, sizeof(int3));
> +
> +	on_each_cpu(do_sync_core, NULL, 1);
> +
> +	bp_patching_in_progress = false;
> +	smp_wmb();
> +
> +	return addr;
> +}
> +
> +/* this one needs to run before anything else handles it as a
> + * regular exception */
> +static struct notifier_block int3_nb = {
> +	.priority = 0x7fffffff,
> +	.notifier_call = int3_notify
> +};
> +
> +static int __init int3_init(void)
> +{
> +	return register_die_notifier(&int3_nb);
> +}
> +
> +arch_initcall(int3_init);
>  /*
>   * Cross-modifying kernel text with stop_machine().
>   * This code originally comes from immediate value.
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index bddf3b2..d6db7bd 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1709,7 +1709,7 @@ EXPORT_SYMBOL_GPL(unregister_kprobes);
>  
>  static struct notifier_block kprobe_exceptions_nb = {
>  	.notifier_call = kprobe_exceptions_notify,
> -	.priority = 0x7fffffff /* we need to be notified first */
> +	.priority = 0x7ffffff0 /* High priority, but not first.  */
>  };
>  
>  unsigned long __weak arch_deref_entry_point(void *entry)
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@...achi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ