lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 24 Apr 2024 12:12:26 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Howard Chu <howardchu95@...il.com>
Cc: peterz@...radead.org, mingo@...hat.com, acme@...nel.org, 
	mark.rutland@....com, alexander.shishkin@...ux.intel.com, jolsa@...nel.org, 
	irogers@...gle.com, adrian.hunter@...el.com, kan.liang@...ux.intel.com, 
	zegao2021@...il.com, leo.yan@...ux.dev, ravi.bangoria@....com, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	bpf@...r.kernel.org
Subject: Re: [PATCH v2 0/4] Dump off-cpu samples directly

Hello,

On Tue, Apr 23, 2024 at 7:46 PM Howard Chu <howardchu95@...il.com> wrote:
>
> As mentioned in: https://bugzilla.kernel.org/show_bug.cgi?id=207323
>
> Currently, off-cpu samples are dumped when perf record is exiting. This
> results in off-cpu samples being after the regular samples. Also, samples
> are stored in large BPF maps which contain all the stack traces and
> accumulated off-cpu time, but they are eventually going to fill up after
> running for an extensive period. This patch fixes those problems by dumping
> samples directly into perf ring buffer, and dispatching those samples to the
> correct format.

Thanks for working on this.

But the problem of dumping all sched-switch events is that it can be
too frequent on loaded machines.  Copying many events to the buffer
can result in losing other records.  As perf report doesn't care about
timing much, I decided to aggregate the result in a BPF map and dump
them at the end of the profiling session.

Maybe that's not a concern for you (or smaller systems).  Then I think
we can keep the original behavior and add a new option (I'm not good
at naming things, but maybe --off-cpu-sample?) to work differently
instead of removing the old behavior.

Thanks,
Namhyung

>
> Before, off-cpu samples are after regular samples
>
> ```
>          swapper       0 [000] 963432.136150:    2812933    cycles:P:  ffffffffb7db1bc2 intel_idle+0x62 ([kernel.kallsyms])
>          swapper       0 [000] 963432.637911:    4932876    cycles:P:  ffffffffb7db1bc2 intel_idle+0x62 ([kernel.kallsyms])
>          swapper       0 [001] 963432.798072:    6273398    cycles:P:  ffffffffb7db1bc2 intel_idle+0x62 ([kernel.kallsyms])
>          swapper       0 [000] 963433.541152:    5279005    cycles:P:  ffffffffb7db1bc2 intel_idle+0x62 ([kernel.kallsyms])
> sh 1410180 [000] 18446744069.414584:    2528851 offcpu-time:
>             7837148e6e87 wait4+0x17 (/usr/lib/libc.so.6)
>
>
> sh 1410185 [000] 18446744069.414584:    2314223 offcpu-time:
>             7837148e6e87 wait4+0x17 (/usr/lib/libc.so.6)
>
>
> awk 1409644 [000] 18446744069.414584:     191785 offcpu-time:
>             702609d03681 read+0x11 (/usr/lib/libc.so.6)
>                   4a02a4 [unknown] ([unknown])
> ```
>
>
> After, regular samples(cycles:P) and off-cpu(offcpu-time) samples are
> collected simultaneously:
>
> ```
> upowerd     741 [000] 963757.428701:     297848 offcpu-time:
>             72b2da11e6bc read+0x4c (/usr/lib/libc.so.6)
>
>
>       irq/9-acpi      56 [000] 963757.429116:    8760875    cycles:P:  ffffffffb779849f acpi_os_read_port+0x2f ([kernel.kallsyms])
> upowerd     741 [000] 963757.429172:     459522 offcpu-time:
>             72b2da11e6bc read+0x4c (/usr/lib/libc.so.6)
>
>
>          swapper       0 [002] 963757.434529:    5759904    cycles:P:  ffffffffb7db1bc2 intel_idle+0x62 ([kernel.kallsyms])
> perf 1419260 [000] 963757.434550: 1001012116 offcpu-time:
>             7274e5d190bf __poll+0x4f (/usr/lib/libc.so.6)
>             591acfc5daf0 perf_evlist__poll+0x24 (/root/hw/perf-tools-next/tools/perf/perf)
>             591acfb1ca50 perf_evlist__poll_thread+0x160 (/root/hw/perf-tools-next/tools/perf/perf)
>             7274e5ca955a [unknown] (/usr/lib/libc.so.6)
> ```
>
> Here's a simple flowchart:
>
> [parse_event (sample type: PERF_SAMPLE_RAW)] --> [config (bind fds,
> sample_id, sample_type)] --> [off_cpu_strip (sample type: PERF_SAMPLE_RAW)] -->
> [record_done(hooks off_cpu_finish)] --> [prepare_parse(sample type: OFFCPU_SAMPLE_TYPES)]
>
> Changes in v2:
>  - Remove unnecessary comments.
>  - Rename function off_cpu_change_type to off_cpu_prepare_parse
>
> Howard Chu (4):
>   perf record off-cpu: Parse off-cpu event, change config location
>   perf record off-cpu: BPF perf_event_output on sched_switch
>   perf record off-cpu: extract off-cpu sample data from raw_data
>   perf record off-cpu: delete bound-to-fail test
>
>  tools/perf/builtin-record.c             |  98 +++++++++-
>  tools/perf/tests/shell/record_offcpu.sh |  29 ---
>  tools/perf/util/bpf_off_cpu.c           | 242 +++++++++++-------------
>  tools/perf/util/bpf_skel/off_cpu.bpf.c  | 163 +++++++++++++---
>  tools/perf/util/evsel.c                 |   8 -
>  tools/perf/util/off_cpu.h               |  14 +-
>  tools/perf/util/perf-hooks-list.h       |   1 +
>  7 files changed, 344 insertions(+), 211 deletions(-)
>
> --
> 2.44.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ