[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1uUz6sU90IQQAyp@google.com>
Date: Thu, 12 Dec 2024 17:58:39 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Howard Chu <howardchu95@...il.com>
Cc: acme@...nel.org, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
irogers@...gle.com, adrian.hunter@...el.com,
kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
James Clark <james.clark@...aro.org>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v13 09/10] perf record --off-cpu: Add --off-cpu-thresh
option
On Thu, Dec 12, 2024 at 03:27:16PM -0800, Howard Chu wrote:
> Specify the threshold for dumping offcpu samples with --off-cpu-thresh,
> the unit is milliseconds. Default value is 500ms.
>
> Example:
>
> perf record --off-cpu --off-cpu-thresh 824
>
> The example above collects off-cpu samples where the off-cpu time is
> longer than 824ms
>
> Suggested-by: Ian Rogers <irogers@...gle.com>
> Suggested-by: Namhyung Kim <namhyung@...nel.org>
> Suggested-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> Reviewed-by: Ian Rogers <irogers@...gle.com>
> Signed-off-by: Howard Chu <howardchu95@...il.com>
> Cc: Adrian Hunter <adrian.hunter@...el.com>
> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: James Clark <james.clark@...aro.org>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: Kan Liang <kan.liang@...ux.intel.com>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Link: https://lore.kernel.org/r/20241108204137.2444151-2-howardchu95@gmail.com
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> ---
> tools/perf/Documentation/perf-record.txt | 9 ++++++++
> tools/perf/builtin-record.c | 26 ++++++++++++++++++++++++
> tools/perf/util/bpf_off_cpu.c | 3 +++
> tools/perf/util/bpf_skel/off_cpu.bpf.c | 2 +-
> tools/perf/util/off_cpu.h | 1 +
> tools/perf/util/record.h | 1 +
> 6 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 242223240a08..f3ac4c739d5f 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -829,6 +829,15 @@ filtered through the mask provided by -C option.
> only, as of now. So the applications built without the frame
> pointer might see bogus addresses.
>
> + off-cpu profiling consists two types of samples: direct samples, which
> + share the same behavior as regular samples, and the accumulated
> + samples, stored in BPF stack trace map, presented after all the regular
> + samples.
> +
> +--off-cpu-thresh::
> + Once a task's off-cpu time reaches this threshold (in milliseconds), it
> + generates a direct off-cpu sample. The default is 500ms.
> +
> --setup-filter=<action>::
> Prepare BPF filter to be used by regular users. The action should be
> either "pin" or "unpin". The filter can be used after it's pinned.
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 0b637cea4850..62183a6857f2 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -3147,6 +3147,28 @@ static int record__parse_mmap_pages(const struct option *opt,
> return ret;
> }
>
> +static int record__parse_off_cpu_thresh(const struct option *opt,
> + const char *str,
> + int unset __maybe_unused)
> +{
> + struct record_opts *opts = opt->value;
> + char *endptr;
> + u64 off_cpu_thresh_ms; // converted to us for potential future improvements
I'm not sure about the future, but we now have the option in msec and
convert it to usec, and finally BPF uses nsec. Is the usec conversion
really needed? Maybe we can just use nsec internally.
Thanks,
Namhyung
> +
> + if (!str)
> + return -EINVAL;
> +
> + off_cpu_thresh_ms = strtoull(str, &endptr, 10);
> +
> + /* the threshold isn't string "0", yet strtoull() returns 0, parsing failed */
> + if (*endptr || (off_cpu_thresh_ms == 0 && strcmp(str, "0")))
> + return -EINVAL;
> + else
> + opts->off_cpu_thresh_us = off_cpu_thresh_ms * USEC_PER_MSEC;
> +
> + return 0;
> +}
> +
> void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused)
> {
> }
> @@ -3340,6 +3362,7 @@ static struct record record = {
> .ctl_fd = -1,
> .ctl_fd_ack = -1,
> .synth = PERF_SYNTH_ALL,
> + .off_cpu_thresh_us = OFFCPU_THRESH,
> },
> };
>
> @@ -3562,6 +3585,9 @@ static struct option __record_options[] = {
> OPT_BOOLEAN(0, "off-cpu", &record.off_cpu, "Enable off-cpu analysis"),
> OPT_STRING(0, "setup-filter", &record.filter_action, "pin|unpin",
> "BPF filter action"),
> + OPT_CALLBACK(0, "off-cpu-thresh", &record.opts, "ms",
> + "Dump off-cpu samples if off-cpu time exceeds this threshold (in milliseconds). (Default: 500ms)",
> + record__parse_off_cpu_thresh),
> OPT_END()
> };
>
> diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> index 2e7e4ae43ffc..c3ac19393c0f 100644
> --- a/tools/perf/util/bpf_off_cpu.c
> +++ b/tools/perf/util/bpf_off_cpu.c
> @@ -14,6 +14,7 @@
> #include "util/strlist.h"
> #include <bpf/bpf.h>
> #include <internal/xyarray.h>
> +#include <linux/time64.h>
>
> #include "bpf_skel/off_cpu.skel.h"
>
> @@ -286,6 +287,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
> }
> }
>
> + skel->bss->offcpu_thresh_ns = opts->off_cpu_thresh_us * NSEC_PER_USEC;
> +
> err = off_cpu_bpf__attach(skel);
> if (err) {
> pr_err("Failed to attach off-cpu BPF skeleton\n");
> diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> index 77fdc9e81db3..aae63d999abb 100644
> --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> @@ -123,7 +123,7 @@ const volatile bool uses_cgroup_v1 = false;
>
> int perf_subsys_id = -1;
>
> -__u64 offcpu_thresh_ns = 500000000ull;
> +__u64 offcpu_thresh_ns;
>
> /*
> * Old kernel used to call it task_struct->state and now it's '__state'.
> diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
> index 2a4b7f9b2c4c..f07ab2e36317 100644
> --- a/tools/perf/util/off_cpu.h
> +++ b/tools/perf/util/off_cpu.h
> @@ -16,6 +16,7 @@ struct record_opts;
> PERF_SAMPLE_PERIOD | PERF_SAMPLE_RAW | \
> PERF_SAMPLE_CGROUP)
>
> +#define OFFCPU_THRESH 500000ull
>
> #ifdef HAVE_BPF_SKEL
> int off_cpu_prepare(struct evlist *evlist, struct target *target,
> diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
> index a6566134e09e..2ca74add26c0 100644
> --- a/tools/perf/util/record.h
> +++ b/tools/perf/util/record.h
> @@ -79,6 +79,7 @@ struct record_opts {
> int synth;
> int threads_spec;
> const char *threads_user_spec;
> + u64 off_cpu_thresh_us;
> };
>
> extern const char * const *record_usage;
> --
> 2.43.0
>
Powered by blists - more mailing lists