linux-kernel - Re: [RFC PATCH 5/8] uprobe/x86: Add support to optimize on top of emulated instructions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aSSdavSy_unRaEgF@redhat.com>
Date: Mon, 24 Nov 2025 19:01:14 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Jiri Olsa <jolsa@...nel.org>
Cc: Masami Hiramatsu <mhiramat@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrii Nakryiko <andrii@...nel.org>, bpf@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
	x86@...nel.org, Song Liu <songliubraving@...com>,
	Yonghong Song <yhs@...com>,
	John Fastabend <john.fastabend@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...nel.org>,
	David Laight <David.Laight@...lab.com>
Subject: Re: [RFC PATCH 5/8] uprobe/x86: Add support to optimize on top of
 emulated instructions

Hi Jiri,

I am trying to understand this series, will try to read it more carefully
later...

(damn why do you always send the patches when I am on PTO? ;)

On 11/17, Jiri Olsa wrote:
>
>  struct arch_uprobe {
>  	union {
> -		u8			insn[MAX_UINSN_BYTES];
> +		u8			insn[5*MAX_UINSN_BYTES];

Hmm. OK, this matches the "for (i = 0; i < 5; i++)" loop in
opt_setup_xol_ops(), but do we really need this change? Please see
the question at the end.

> +static int opt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> +{
> +	unsigned long offset = insn->length;
> +	struct insn insnX;
> +	int i, ret;
> +
> +	if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags))
> +		return -ENOSYS;

I think this logic needs some cleanups... If ARCH_UPROBE_FLAG_CAN_OPTIMIZE
is set by the caller, the it doesn't make sense to call xxx_setup_xol_ops(),
right? But lets forget it for now.

> +	ret = opt_setup_xol_insns(auprobe, &auprobe->opt.xol[0], insn);

I think this should go into the main loop, see below

> +	for (i = 1; i < 5; i++) {
> +		ret = uprobe_init_insn_offset(auprobe, offset, &insnX, true);
> +		if (ret)
> +			break;
> +		ret = opt_setup_xol_insns(auprobe, &auprobe->opt.xol[i], &insnX);
> +		if (ret)
> +			break;
> +		offset += insnX.length;
> +		auprobe->opt.cnt++;
> +		if (offset >= 5)
> +			goto optimize;
> +	}
> +
> +	return -ENOSYS;

I don't think -ENOSYS makes sense if opt_setup_xol_insns() succeeds at least once.
IOW, how about

	static int opt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
	{
		unsigned long offset = 0;
		struct insn insnX;
		int i, ret;

		if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags))
			return -ENOSYS;

		for (i = 0; i < 5; i++) {
			ret = opt_setup_xol_insns(auprobe, &auprobe->opt.xol[i], insn);
			if (ret)
				break;
			offset += insn->length;
			if (offset >= 5)
				break;

			insn = &insnX;
			ret = uprobe_init_insn_offset(auprobe, offset, insn, true);
			if (ret)
				break;
		}

		if (!offset)
			return -ENOSYS;

		if (offset >= 5) {
			auprobe->opt.cnt = i + 1;
			auprobe->xol.ops = &opt_xol_ops;
			set_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags);
			set_bit(ARCH_UPROBE_FLAG_OPTIMIZE_EMULATE, &auprobe->flags);
		}

		return 0;
	}

?

This way the caller, arch_uprobe_analyze_insn(), doesn't need to call
push/mov/sub/_setup_xol_ops(), and the code looks a bit simpler to me.

No?

> +      * TODO perhaps we could 'emulate' nop, so there would be no need for
> +      * ARCH_UPROBE_FLAG_OPTIMIZE_EMULATE flag, because we would emulate
> +      * allways.

Agreed... and this connects to "this logic needs some cleanups" above.
I guess we need nop_setup_xol_ops() extracted from branch_setup_xol_ops()
but again, lets forget it for now.

-------------------------------------------------------------------------------
Now the main question. What if we avoid this change

	-             u8                      insn[MAX_UINSN_BYTES];
	+             u8                      insn[5*MAX_UINSN_BYTES];

mentioned above, and change opt_setup_xol_ops() to just do

	-	for (i = 0; i < 5; i++)
	+	for (i = 0;; i++)

?

The main loop stops when offset >= 5 anyway.

And. if auprobe->insn[offset:MAX_UINSN_BYTES] doesn't contain a full/valid
insn at the start, then uprobe_init_insn_offset()->insn_decode() should fail?

Most probably I missed something, but I can't understand this part.

Oleg.