[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241022130103.3509201a2a26c4be5de1a852@kernel.org>
Date: Tue, 22 Oct 2024 13:01:03 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Liao Chang <liaochang1@...wei.com>
Cc: <oleg@...hat.com>, <peterz@...radead.org>, <mingo@...hat.com>,
<acme@...nel.org>, <namhyung@...nel.org>, <mark.rutland@....com>,
<alexander.shishkin@...ux.intel.com>, <jolsa@...nel.org>,
<irogers@...gle.com>, <adrian.hunter@...el.com>,
<kan.liang@...ux.intel.com>, <linux-kernel@...r.kernel.org>,
<linux-trace-kernel@...r.kernel.org>, <linux-perf-users@...r.kernel.org>,
<bpf@...r.kernel.org>
Subject: Re: [PATCH v3 2/2] uprobes: Remove the spinlock within
handle_singlestep()
Hi Liao,
On Thu, 15 Aug 2024 01:46:29 +0000
Liao Chang <liaochang1@...wei.com> wrote:
> This patch introduces a flag to track TIF_SIGPENDING is suppress
> temporarily during the uprobe single-step. Upon uprobe singlestep is
> handled and the flag is confirmed, it could resume the TIF_SIGPENDING
> directly without acquiring the siglock in most case, then reducing
> contention and improving overall performance.
>
> I've use the script developed by Andrii in [1] to run benchmark. The CPU
> used was Kunpeng916 (Hi1616), 4 NUMA nodes, 64 cores@...GHz running the
> kernel on next tree + the optimization for get_xol_insn_slot() [2].
>
> before-opt
> ----------
> uprobe-nop ( 1 cpus): 0.907 ± 0.003M/s ( 0.907M/s/cpu)
> uprobe-nop ( 2 cpus): 1.676 ± 0.008M/s ( 0.838M/s/cpu)
> uprobe-nop ( 4 cpus): 3.210 ± 0.003M/s ( 0.802M/s/cpu)
> uprobe-nop ( 8 cpus): 4.457 ± 0.003M/s ( 0.557M/s/cpu)
> uprobe-nop (16 cpus): 3.724 ± 0.011M/s ( 0.233M/s/cpu)
> uprobe-nop (32 cpus): 2.761 ± 0.003M/s ( 0.086M/s/cpu)
> uprobe-nop (64 cpus): 1.293 ± 0.015M/s ( 0.020M/s/cpu)
>
> uprobe-push ( 1 cpus): 0.883 ± 0.001M/s ( 0.883M/s/cpu)
> uprobe-push ( 2 cpus): 1.642 ± 0.005M/s ( 0.821M/s/cpu)
> uprobe-push ( 4 cpus): 3.086 ± 0.002M/s ( 0.771M/s/cpu)
> uprobe-push ( 8 cpus): 3.390 ± 0.003M/s ( 0.424M/s/cpu)
> uprobe-push (16 cpus): 2.652 ± 0.005M/s ( 0.166M/s/cpu)
> uprobe-push (32 cpus): 2.713 ± 0.005M/s ( 0.085M/s/cpu)
> uprobe-push (64 cpus): 1.313 ± 0.009M/s ( 0.021M/s/cpu)
>
> uprobe-ret ( 1 cpus): 1.774 ± 0.000M/s ( 1.774M/s/cpu)
> uprobe-ret ( 2 cpus): 3.350 ± 0.001M/s ( 1.675M/s/cpu)
> uprobe-ret ( 4 cpus): 6.604 ± 0.000M/s ( 1.651M/s/cpu)
> uprobe-ret ( 8 cpus): 6.706 ± 0.005M/s ( 0.838M/s/cpu)
> uprobe-ret (16 cpus): 5.231 ± 0.001M/s ( 0.327M/s/cpu)
> uprobe-ret (32 cpus): 5.743 ± 0.003M/s ( 0.179M/s/cpu)
> uprobe-ret (64 cpus): 4.726 ± 0.016M/s ( 0.074M/s/cpu)
>
> after-opt
> ---------
> uprobe-nop ( 1 cpus): 0.985 ± 0.002M/s ( 0.985M/s/cpu)
> uprobe-nop ( 2 cpus): 1.773 ± 0.005M/s ( 0.887M/s/cpu)
> uprobe-nop ( 4 cpus): 3.304 ± 0.001M/s ( 0.826M/s/cpu)
> uprobe-nop ( 8 cpus): 5.328 ± 0.002M/s ( 0.666M/s/cpu)
> uprobe-nop (16 cpus): 6.475 ± 0.002M/s ( 0.405M/s/cpu)
> uprobe-nop (32 cpus): 4.831 ± 0.082M/s ( 0.151M/s/cpu)
> uprobe-nop (64 cpus): 2.564 ± 0.053M/s ( 0.040M/s/cpu)
>
> uprobe-push ( 1 cpus): 0.964 ± 0.001M/s ( 0.964M/s/cpu)
> uprobe-push ( 2 cpus): 1.766 ± 0.002M/s ( 0.883M/s/cpu)
> uprobe-push ( 4 cpus): 3.290 ± 0.009M/s ( 0.823M/s/cpu)
> uprobe-push ( 8 cpus): 4.670 ± 0.002M/s ( 0.584M/s/cpu)
> uprobe-push (16 cpus): 5.197 ± 0.004M/s ( 0.325M/s/cpu)
> uprobe-push (32 cpus): 5.068 ± 0.161M/s ( 0.158M/s/cpu)
> uprobe-push (64 cpus): 2.605 ± 0.026M/s ( 0.041M/s/cpu)
>
> uprobe-ret ( 1 cpus): 1.833 ± 0.001M/s ( 1.833M/s/cpu)
> uprobe-ret ( 2 cpus): 3.384 ± 0.003M/s ( 1.692M/s/cpu)
> uprobe-ret ( 4 cpus): 6.677 ± 0.004M/s ( 1.669M/s/cpu)
> uprobe-ret ( 8 cpus): 6.854 ± 0.005M/s ( 0.857M/s/cpu)
> uprobe-ret (16 cpus): 6.508 ± 0.006M/s ( 0.407M/s/cpu)
> uprobe-ret (32 cpus): 5.793 ± 0.009M/s ( 0.181M/s/cpu)
> uprobe-ret (64 cpus): 4.743 ± 0.016M/s ( 0.074M/s/cpu)
>
> Above benchmark results demonstrates a obivious improvement in the
> scalability of trig-uprobe-nop and trig-uprobe-push, the peak throughput
> of which are from 4.5M/s to 6.4M/s and 3.3M/s to 5.1M/s individually.
>
> [1] https://lore.kernel.org/all/20240731214256.3588718-1-andrii@kernel.org
> [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaochang1@huawei.com
>
This looks good to me.
Acked-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
Thanks,
> Acked-by: Oleg Nesterov <oleg@...hat.com>
> Signed-off-by: Liao Chang <liaochang1@...wei.com>
> ---
> include/linux/uprobes.h | 1 +
> kernel/events/uprobes.c | 8 +++++---
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index b503fafb7fb3..e4f57117d9c3 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -75,6 +75,7 @@ struct uprobe_task {
>
> struct uprobe *active_uprobe;
> unsigned long xol_vaddr;
> + bool signal_denied;
>
> struct return_instance *return_instances;
> unsigned int depth;
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 76a51a1f51e2..589aa2af1a99 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1979,6 +1979,7 @@ bool uprobe_deny_signal(void)
> WARN_ON_ONCE(utask->state != UTASK_SSTEP);
>
> if (task_sigpending(t)) {
> + utask->signal_denied = true;
> clear_tsk_thread_flag(t, TIF_SIGPENDING);
>
> if (__fatal_signal_pending(t) || arch_uprobe_xol_was_trapped(t)) {
> @@ -2288,9 +2289,10 @@ static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
> utask->state = UTASK_RUNNING;
> xol_free_insn_slot(current);
>
> - spin_lock_irq(¤t->sighand->siglock);
> - recalc_sigpending(); /* see uprobe_deny_signal() */
> - spin_unlock_irq(¤t->sighand->siglock);
> + if (utask->signal_denied) {
> + set_thread_flag(TIF_SIGPENDING);
> + utask->signal_denied = false;
> + }
>
> if (unlikely(err)) {
> uprobe_warn(current, "execute the probed insn, sending SIGILL.");
> --
> 2.34.1
>
>
--
Masami Hiramatsu (Google) <mhiramat@...nel.org>
Powered by blists - more mailing lists