lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241213104536.GZ35539@noisy.programming.kicks-ass.net>
Date: Fri, 13 Dec 2024 11:45:36 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Jiri Olsa <jolsa@...nel.org>
Cc: Oleg Nesterov <oleg@...hat.com>, Andrii Nakryiko <andrii@...nel.org>,
	bpf@...r.kernel.org, Song Liu <songliubraving@...com>,
	Yonghong Song <yhs@...com>,
	John Fastabend <john.fastabend@...il.com>,
	Hao Luo <haoluo@...gle.com>, Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Alan Maguire <alan.maguire@...cle.com>,
	linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next 07/13] uprobes/x86: Add support to emulate nop5
 instruction

On Wed, Dec 11, 2024 at 02:33:56PM +0100, Jiri Olsa wrote:
> Adding support to emulate nop5 as the original uprobe instruction.
> 
> This speeds up uprobes on top of nop5 instructions:
> (results from benchs/run_bench_uprobes.sh)
> 
> current:
> 
>      uprobe-nop     :    3.252 ± 0.019M/s
>      uprobe-push    :    3.097 ± 0.002M/s
>      uprobe-ret     :    1.116 ± 0.001M/s
>  --> uprobe-nop5    :    1.115 ± 0.001M/s
>      uretprobe-nop  :    1.731 ± 0.016M/s
>      uretprobe-push :    1.673 ± 0.023M/s
>      uretprobe-ret  :    0.843 ± 0.009M/s
>  --> uretprobe-nop5 :    1.124 ± 0.001M/s
> 
> after the change:
> 
>      uprobe-nop     :    3.281 ± 0.003M/s
>      uprobe-push    :    3.085 ± 0.003M/s
>      uprobe-ret     :    1.130 ± 0.000M/s
>  --> uprobe-nop5    :    3.276 ± 0.007M/s
>      uretprobe-nop  :    1.716 ± 0.016M/s
>      uretprobe-push :    1.651 ± 0.017M/s
>      uretprobe-ret  :    0.846 ± 0.006M/s
>  --> uretprobe-nop5 :    3.279 ± 0.002M/s
> 
> Strangely I can see uretprobe-nop5 is now much faster compared to
> uretprobe-nop, while perf profiles for both are almost identical.
> I'm still checking on that.
> 
> Signed-off-by: Jiri Olsa <jolsa@...nel.org>
> ---
>  arch/x86/kernel/uprobes.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 23e4f2821cff..cdea97f8cd39 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -909,6 +909,11 @@ static const struct uprobe_xol_ops push_xol_ops = {
>  	.emulate  = push_emulate_op,
>  };
>  
> +static int is_nop5_insn(uprobe_opcode_t *insn)
> +{
> +	return !memcmp(insn, x86_nops[5], 5);
> +}
> +
>  /* Returns -ENOSYS if branch_xol_ops doesn't handle this insn */
>  static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
>  {
> @@ -928,6 +933,8 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
>  		break;
>  
>  	case 0x0f:
> +		if (is_nop5_insn((uprobe_opcode_t *) &auprobe->insn))
> +			goto setup;

This isn't right, this is not x86_64 specific code, and there's a bunch
of 32bit 5 byte nops that do not start with 0f.

Also, since you already have the insn decoded, I would suggest you
simply check OPCODE2(insn) == 0x1f /* NOPL */ and length == 5.

>  		if (insn->opcode.nbytes != 2)
>  			return -ENOSYS;
>  		/*
> -- 
> 2.47.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ