lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzsLYQiER7EXf5J0@krava>
Date: Mon, 18 Nov 2024 10:39:45 +0100
From: Jiri Olsa <olsajiri@...il.com>
To: Masami Hiramatsu <mhiramat@...nel.org>
Cc: Oleg Nesterov <oleg@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Andrii Nakryiko <andrii@...nel.org>, bpf@...r.kernel.org,
	Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
	John Fastabend <john.fastabend@...il.com>,
	Hao Luo <haoluo@...gle.com>, Steven Rostedt <rostedt@...dmis.org>,
	Alan Maguire <alan.maguire@...cle.com>,
	linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [RFC perf/core 07/11] uprobes/x86: Add support to optimize
 uprobes

On Mon, Nov 18, 2024 at 05:18:08PM +0900, Masami Hiramatsu wrote:
> On Tue,  5 Nov 2024 14:34:01 +0100
> Jiri Olsa <jolsa@...nel.org> wrote:
> 
> > Putting together all the previously added pieces to support optimized
> > uprobes on top of 5-byte nop instruction.
> > 
> > The current uprobe execution goes through following:
> >   - installs breakpoint instruction over original instruction
> >   - exception handler hit and calls related uprobe consumers
> >   - and either simulates original instruction or does out of line single step
> >     execution of it
> >   - returns to user space
> > 
> > The optimized uprobe path
> > 
> >   - checks the original instruction is 5-byte nop (plus other checks)
> >   - adds (or uses existing) user space trampoline and overwrites original
> >     instruction (5-byte nop) with call to user space trampoline
> >   - the user space trampoline executes uprobe syscall that calls related uprobe
> >     consumers
> >   - trampoline returns back to next instruction
> > 
> > This approach won't speed up all uprobes as it's limited to using nop5 as
> > original instruction, but we could use nop5 as USDT probe instruction (which
> > uses single byte nop ATM) and speed up the USDT probes.
> > 
> > This patch overloads related arch functions in uprobe_write_opcode and
> > set_orig_insn so they can install call instruction if needed.
> > 
> > The arch_uprobe_optimize triggers the uprobe optimization and is called after
> > first uprobe hit. I originally had it called on uprobe installation but then
> > it clashed with elf loader, because the user space trampoline was added in a
> > place where loader might need to put elf segments, so I decided to do it after
> > first uprobe hit when loading is done.
> > 
> > TODO release uprobe trampoline when it's no longer needed.. we might need to
> > stop all cpus to make sure no user space thread is in the trampoline.. or we
> > might just keep it, because there's just one 4GB memory region?
> > 
> > Signed-off-by: Jiri Olsa <jolsa@...nel.org>
> > ---
> >  arch/x86/include/asm/uprobes.h |   7 ++
> >  arch/x86/kernel/uprobes.c      | 130 +++++++++++++++++++++++++++++++++
> >  include/linux/uprobes.h        |   1 +
> >  kernel/events/uprobes.c        |   3 +
> >  4 files changed, 141 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> > index 678fb546f0a7..84a75ed748f0 100644
> > --- a/arch/x86/include/asm/uprobes.h
> > +++ b/arch/x86/include/asm/uprobes.h
> > @@ -20,6 +20,11 @@ typedef u8 uprobe_opcode_t;
> >  #define UPROBE_SWBP_INSN		0xcc
> >  #define UPROBE_SWBP_INSN_SIZE		   1
> >  
> > +enum {
> > +	ARCH_UPROBE_FLAG_CAN_OPTIMIZE	= 0,
> > +	ARCH_UPROBE_FLAG_OPTIMIZED	= 1,
> > +};
> > +
> >  struct uprobe_xol_ops;
> >  
> >  struct arch_uprobe {
> > @@ -45,6 +50,8 @@ struct arch_uprobe {
> >  			u8	ilen;
> >  		}			push;
> >  	};
> > +
> > +	unsigned long flags;
> >  };
> >  
> >  struct arch_uprobe_task {
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 02aa4519b677..50ccf24ff42c 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -18,6 +18,7 @@
> >  #include <asm/processor.h>
> >  #include <asm/insn.h>
> >  #include <asm/mmu_context.h>
> > +#include <asm/nops.h>
> >  
> >  /* Post-execution fixups. */
> >  
> > @@ -877,6 +878,33 @@ static const struct uprobe_xol_ops push_xol_ops = {
> >  	.emulate  = push_emulate_op,
> >  };
> >  
> > +static int is_nop5_insns(uprobe_opcode_t *insn)
> > +{
> > +	return !memcmp(insn, x86_nops[5], 5);
> 
> Maybe better to use BYTES_NOP5 directly?

ok

> 
> > +}
> > +
> > +static int is_call_insns(uprobe_opcode_t *insn)
> > +{
> > +	return *insn == 0xe8;
> 
> 0xe8 -> CALL_INSN_OPCODE

right, thanks

jirka

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ