lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1uUz6sU90IQQAyp@google.com>
Date: Thu, 12 Dec 2024 17:58:39 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Howard Chu <howardchu95@...il.com>
Cc: acme@...nel.org, mark.rutland@....com,
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
	irogers@...gle.com, adrian.hunter@...el.com,
	kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Ingo Molnar <mingo@...hat.com>,
	James Clark <james.clark@...aro.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v13 09/10] perf record --off-cpu: Add --off-cpu-thresh
 option

On Thu, Dec 12, 2024 at 03:27:16PM -0800, Howard Chu wrote:
> Specify the threshold for dumping offcpu samples with --off-cpu-thresh,
> the unit is milliseconds. Default value is 500ms.
> 
> Example:
> 
>   perf record --off-cpu --off-cpu-thresh 824
> 
> The example above collects off-cpu samples where the off-cpu time is
> longer than 824ms
> 
> Suggested-by: Ian Rogers <irogers@...gle.com>
> Suggested-by: Namhyung Kim <namhyung@...nel.org>
> Suggested-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> Reviewed-by: Ian Rogers <irogers@...gle.com>
> Signed-off-by: Howard Chu <howardchu95@...il.com>
> Cc: Adrian Hunter <adrian.hunter@...el.com>
> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: James Clark <james.clark@...aro.org>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: Kan Liang <kan.liang@...ux.intel.com>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Link: https://lore.kernel.org/r/20241108204137.2444151-2-howardchu95@gmail.com
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> ---
>  tools/perf/Documentation/perf-record.txt |  9 ++++++++
>  tools/perf/builtin-record.c              | 26 ++++++++++++++++++++++++
>  tools/perf/util/bpf_off_cpu.c            |  3 +++
>  tools/perf/util/bpf_skel/off_cpu.bpf.c   |  2 +-
>  tools/perf/util/off_cpu.h                |  1 +
>  tools/perf/util/record.h                 |  1 +
>  6 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 242223240a08..f3ac4c739d5f 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -829,6 +829,15 @@ filtered through the mask provided by -C option.
>  	only, as of now.  So the applications built without the frame
>  	pointer might see bogus addresses.
>  
> +	off-cpu profiling consists two types of samples: direct samples, which
> +	share the same behavior as regular samples, and the accumulated
> +	samples, stored in BPF stack trace map, presented after all the regular
> +	samples.
> +
> +--off-cpu-thresh::
> +	Once a task's off-cpu time reaches this threshold (in milliseconds), it
> +	generates a direct off-cpu sample. The default is 500ms.
> +
>  --setup-filter=<action>::
>  	Prepare BPF filter to be used by regular users.  The action should be
>  	either "pin" or "unpin".  The filter can be used after it's pinned.
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 0b637cea4850..62183a6857f2 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -3147,6 +3147,28 @@ static int record__parse_mmap_pages(const struct option *opt,
>  	return ret;
>  }
>  
> +static int record__parse_off_cpu_thresh(const struct option *opt,
> +					const char *str,
> +					int unset __maybe_unused)
> +{
> +	struct record_opts *opts = opt->value;
> +	char *endptr;
> +	u64 off_cpu_thresh_ms; // converted to us for potential future improvements

I'm not sure about the future, but we now have the option in msec and
convert it to usec, and finally BPF uses nsec.  Is the usec conversion
really needed?  Maybe we can just use nsec internally.

Thanks,
Namhyung

> +
> +	if (!str)
> +		return -EINVAL;
> +
> +	off_cpu_thresh_ms = strtoull(str, &endptr, 10);
> +
> +	/* the threshold isn't string "0", yet strtoull() returns 0, parsing failed */
> +	if (*endptr || (off_cpu_thresh_ms == 0 && strcmp(str, "0")))
> +		return -EINVAL;
> +	else
> +		opts->off_cpu_thresh_us = off_cpu_thresh_ms * USEC_PER_MSEC;
> +
> +	return 0;
> +}
> +
>  void __weak arch__add_leaf_frame_record_opts(struct record_opts *opts __maybe_unused)
>  {
>  }
> @@ -3340,6 +3362,7 @@ static struct record record = {
>  		.ctl_fd              = -1,
>  		.ctl_fd_ack          = -1,
>  		.synth               = PERF_SYNTH_ALL,
> +		.off_cpu_thresh_us   = OFFCPU_THRESH,
>  	},
>  };
>  
> @@ -3562,6 +3585,9 @@ static struct option __record_options[] = {
>  	OPT_BOOLEAN(0, "off-cpu", &record.off_cpu, "Enable off-cpu analysis"),
>  	OPT_STRING(0, "setup-filter", &record.filter_action, "pin|unpin",
>  		   "BPF filter action"),
> +	OPT_CALLBACK(0, "off-cpu-thresh", &record.opts, "ms",
> +		     "Dump off-cpu samples if off-cpu time exceeds this threshold (in milliseconds). (Default: 500ms)",
> +		     record__parse_off_cpu_thresh),
>  	OPT_END()
>  };
>  
> diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
> index 2e7e4ae43ffc..c3ac19393c0f 100644
> --- a/tools/perf/util/bpf_off_cpu.c
> +++ b/tools/perf/util/bpf_off_cpu.c
> @@ -14,6 +14,7 @@
>  #include "util/strlist.h"
>  #include <bpf/bpf.h>
>  #include <internal/xyarray.h>
> +#include <linux/time64.h>
>  
>  #include "bpf_skel/off_cpu.skel.h"
>  
> @@ -286,6 +287,8 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target,
>  		}
>  	}
>  
> +	skel->bss->offcpu_thresh_ns = opts->off_cpu_thresh_us * NSEC_PER_USEC;
> +
>  	err = off_cpu_bpf__attach(skel);
>  	if (err) {
>  		pr_err("Failed to attach off-cpu BPF skeleton\n");
> diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> index 77fdc9e81db3..aae63d999abb 100644
> --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
> +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
> @@ -123,7 +123,7 @@ const volatile bool uses_cgroup_v1 = false;
>  
>  int perf_subsys_id = -1;
>  
> -__u64 offcpu_thresh_ns = 500000000ull;
> +__u64 offcpu_thresh_ns;
>  
>  /*
>   * Old kernel used to call it task_struct->state and now it's '__state'.
> diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h
> index 2a4b7f9b2c4c..f07ab2e36317 100644
> --- a/tools/perf/util/off_cpu.h
> +++ b/tools/perf/util/off_cpu.h
> @@ -16,6 +16,7 @@ struct record_opts;
>  			      PERF_SAMPLE_PERIOD | PERF_SAMPLE_RAW | \
>  			      PERF_SAMPLE_CGROUP)
>  
> +#define OFFCPU_THRESH 500000ull
>  
>  #ifdef HAVE_BPF_SKEL
>  int off_cpu_prepare(struct evlist *evlist, struct target *target,
> diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
> index a6566134e09e..2ca74add26c0 100644
> --- a/tools/perf/util/record.h
> +++ b/tools/perf/util/record.h
> @@ -79,6 +79,7 @@ struct record_opts {
>  	int	      synth;
>  	int	      threads_spec;
>  	const char    *threads_user_spec;
> +	u64	      off_cpu_thresh_us;
>  };
>  
>  extern const char * const *record_usage;
> -- 
> 2.43.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ