lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYDrVJXnAruko-h5-oXCGuZ92x4KnY-2cD=XXBp1U_kBg@mail.gmail.com>
Date: Tue, 9 Jul 2024 16:55:31 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Liao Chang <liaochang1@...wei.com>
Cc: peterz@...radead.org, mingo@...hat.com, acme@...nel.org, 
	namhyung@...nel.org, mark.rutland@....com, alexander.shishkin@...ux.intel.com, 
	jolsa@...nel.org, irogers@...gle.com, adrian.hunter@...el.com, 
	kan.liang@...ux.intel.com, ast@...nel.org, daniel@...earbox.net, 
	andrii@...nel.org, martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org, 
	yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org, 
	sdf@...ichev.me, haoluo@...gle.com, mykolal@...com, shuah@...nel.org, 
	linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org, 
	bpf@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 1/2] uprobes: Optimize the return_instance related routines

On Mon, Jul 8, 2024 at 6:00 PM Liao Chang <liaochang1@...wei.com> wrote:
>
> Reduce the runtime overhead for struct return_instance data managed by
> uretprobe. This patch replaces the dynamic allocation with statically
> allocated array, leverage two facts that are limited nesting depth of
> uretprobe (max 64) and the function call style of return_instance usage
> (create at entry, free at exit).
>
> This patch has been tested on Kunpeng916 (Hi1616), 4 NUMA nodes, 64
> cores @ 2.4GHz. Redis benchmarks show a throughput gain by 2% for Redis
> GET and SET commands:
>
> ------------------------------------------------------------------
> Test case       | No uretprobes | uretprobes     | uretprobes
>                 |               | (current)      | (optimized)
> ==================================================================
> Redis SET (RPS) | 47025         | 40619 (-13.6%) | 41529 (-11.6%)
> ------------------------------------------------------------------
> Redis GET (RPS) | 46715         | 41426 (-11.3%) | 42306 (-9.4%)
> ------------------------------------------------------------------
>
> Signed-off-by: Liao Chang <liaochang1@...wei.com>
> ---
>  include/linux/uprobes.h |  10 ++-
>  kernel/events/uprobes.c | 162 ++++++++++++++++++++++++----------------
>  2 files changed, 105 insertions(+), 67 deletions(-)
>

[...]

> +static void cleanup_return_instances(struct uprobe_task *utask, bool chained,
> +                                    struct pt_regs *regs)
> +{
> +       struct return_frame *frame = &utask->frame;
> +       struct return_instance *ri = frame->return_instance;
> +       enum rp_check ctx = chained ? RP_CHECK_CHAIN_CALL : RP_CHECK_CALL;
> +
> +       while (ri && !arch_uretprobe_is_alive(ri, ctx, regs)) {
> +               ri = next_ret_instance(frame, ri);
> +               utask->depth--;
> +       }
> +       frame->return_instance = ri;
> +}
> +
> +static struct return_instance *alloc_return_instance(struct uprobe_task *task)
> +{
> +       struct return_frame *frame = &task->frame;
> +
> +       if (!frame->vaddr) {
> +               frame->vaddr = kcalloc(MAX_URETPROBE_DEPTH,
> +                               sizeof(struct return_instance), GFP_KERNEL);

Are you just pre-allocating MAX_URETPROBE_DEPTH instances always?
I.e., even if we need just one (because there is no recursion), you'd
still waste memory for all 64 ones?

That seems rather wasteful.

Have you considered using objpool for fast reuse across multiple CPUs?
Check lib/objpool.c.

> +               if (!frame->vaddr)
> +                       return NULL;
> +       }
> +
> +       if (!frame->return_instance) {
> +               frame->return_instance = frame->vaddr;
> +               return frame->return_instance;
> +       }
> +
> +       return ++frame->return_instance;
> +}
> +
> +static inline bool return_frame_empty(struct uprobe_task *task)
> +{
> +       return !task->frame.return_instance;
>  }
>
>  /*

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ