lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6uWUMLOG9uBDb3L@x1>
Date: Tue, 11 Feb 2025 19:26:24 +0100
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Howard Chu <howardchu95@...il.com>
Cc: Namhyung Kim <namhyung@...nel.org>, mark.rutland@....com,
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
	irogers@...gle.com, adrian.hunter@...el.com,
	kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v14 00/10] perf record --off-cpu: Dump off-cpu samples
 directly

On Tue, Feb 11, 2025 at 07:02:04PM +0100, Arnaldo Carvalho de Melo wrote:
> When running just one workload:
> 
> root@...ber:~# for a in `seq 10` ; do perf record --off-cpu sleep 1s ; perf script | grep offcpu-time ; done
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (13 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (16 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (14 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (12 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> root@...ber:~#
> 
> Have you tested it like that? Is this a known issue/current limitation?
> 
> I'm reading the patches to see I understand this better.

Ok, if I go back and use the example Namhyung added to the commit
message introducing off-cpu profiling:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=edc41a1099c2d08ccfd4ed7d59688501e3749015

We get offcpu-time samples:

root@...ber:~# perf record --off-cpu perf bench sched messaging -l 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.244 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.182 MB perf.data (43600 samples) ]
root@...ber:~# perf script | grep offcpu-time | wc -l
867
root@...ber:~# 

But they are all at the end:

sched-messaging  469373 17707.960698:    1055439 cpu_atom/cycles/P:  ffffffff8a49a453 mod_objcg_state+0xb3 ([kernel.kallsyms])
 sched-messaging  469373 17707.960944:    1057379 cpu_atom/cycles/P:  ffffffff8b222cf9 native_queued_spin_lock_slowpath+0x209 ([kernel.kallsyms])
 sched-messaging  469274 17707.961299:    6311728 cpu_core/cycles/P:      7fa751a510f1 __cxa_finalize+0x151 (/usr/lib64/libc.so.6)
         :469374  469374 [000] 18446744069.414584:   43388310       offcpu-time: 
            7fa751b1c1c4 [unknown] ([unknown])
                  4fb0e8 [unknown] ([unknown])
                  4faa29 [unknown] ([unknown])
                  495700 [unknown] ([unknown])
                  495a1b [unknown] ([unknown])
                  413473 [unknown] ([unknown])
            7fa751a38088 [unknown] ([unknown])
            7fa751a3814b [unknown] ([unknown])
                  413ac5 [unknown] ([unknown])

         :469518  469518 [000] 18446744069.414584:   22436206       offcpu-time: 
            7fa751b1c1c4 [unknown] ([unknown])
                  4fb0e8 [unknown] ([unknown])
                  4faa29 [unknown] ([unknown])
                  495700 [unknown] ([unknown])
                  495a1b [unknown] ([unknown])
                  413473 [unknown] ([unknown])
            7fa751a38088 [unknown] ([unknown])
            7fa751a3814b [unknown] ([unknown])
                  413ac5 [unknown] ([unknown])

         :469663  469663 [000] 18446744069.414584:  180198700       offcpu-time: 
            7fa751b1d304 [unknown] ([unknown])
                  4faa29 [unknown] ([unknown])
                  495700 [unknown] ([unknown])
                  495a1b [unknown] ([unknown])
                  413473 [unknown] ([unknown])
            7fa751a38088 [unknown] ([unknown])
            7fa751a3814b [unknown] ([unknown])
                  413ac5 [unknown] ([unknown])

possibly the threshold?

Tried:

root@...ber:~# perf record --off-cpu --off-cpu-thresh 10 perf bench sched messaging -l 1000

Didn't help.

When doing a system wide session with that 10 threshold I get the
offcpu-time samples intermixed, i.e. not just at the end:

root@...ber:~# perf record -a --off-cpu --off-cpu-thresh 10 perf bench sched messaging -l 1000

sched-messaging  472324 [007] 18587.393976:    1371787 cpu_core/cycles/P:  ffffffff8a7864d2 avc_has_perm+0x92 ([kernel.kallsyms])
         swapper       0 [023] 18587.394111:     286156 cpu_atom/cycles/P:  ffffffff8a16aacb do_idle+0x14b ([kernel.kallsyms])
 sched-messaging  472532 [016] 18587.394210:     294672 cpu_atom/cycles/P:  ffffffff8b4000ab entry_SYSCALL_64_safe_stack+0x0 ([kernel.kallsyms])
 sched-messaging  472324 [007] 18587.394226:    1371903 cpu_core/cycles/P:  ffffffff8a3f6308 copy_page_range+0x12a8 ([kernel.kallsyms])
          podman    3935 [012] 18587.394310:   10106482       offcpu-time: 
            5630f6bf2457 [unknown] (/usr/bin/podman)
            5630f6bc49e5 [unknown] (/usr/bin/podman)
            5630f6bbb1f5 [unknown] (/usr/bin/podman)
            5630f6bbb136 [unknown] (/usr/bin/podman)
            5630f6beeb65 [unknown] (/usr/bin/podman)
            7f4474032088 start_thread+0x3b8 (/usr/lib64/libc.so.6)
            7f44740b5f8c clone3+0x2c (/usr/lib64/libc.so.6)

 sched-messaging  472534 [016] 18587.394434:     279374 cpu_atom/cycles/P:  ffffffff8a3826c9 get_callchain_entry+0x9 ([kernel.kallsyms])
 sched-messaging  472324 [007] 18587.394476:    1371988 cpu_core/cycles/P:  ffffffff8a43d264 __kmalloc_node_noprof+0xd4 ([kernel.kallsyms])
         swapper       0 [016] 18587.394717:     276534 cpu_atom/cycles/P:  ffffffff8a1d2ef4 rcu_preempt_deferred_qs+0x4 ([kernel.kallsyms])
<SNIP>
 sched-messaging  472674 [013] 18587.414755:    6576247 cpu_core/cycles/P:  ffffffff8b222d7d native_queued_spin_lock_slowpath+0x28d ([kernel.kallsyms])
 sched-messaging  472625 [022] 18587.414760:   10699110       offcpu-time: 
            7f6b1a71d304 __GI___libc_write+0x14 (/usr/lib64/libc.so.6)
                  4fb0e8 bench_sched_messaging+0x218 (/home/acme/bin/perf)
                  4faa29 cmd_bench+0x2f9 (/home/acme/bin/perf)
                  495700 run_builtin+0x70 (/home/acme/bin/perf)
                  495a1b handle_internal_command+0xab (/home/acme/bin/perf)
                  413473 main+0x2f3 (/home/acme/bin/perf)
            7f6b1a638088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
            7f6b1a63814b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
                  413ac5 _start+0x25 (/home/acme/bin/perf)

 sched-messaging  472623 [023] 18587.414763:   10911814       offcpu-time: 
            7f6b1a71d304 __GI___libc_write+0x14 (/usr/lib64/libc.so.6)
                  4fb0e8 bench_sched_messaging+0x218 (/home/acme/bin/perf)
                  4faa29 cmd_bench+0x2f9 (/home/acme/bin/perf)
                  495700 run_builtin+0x70 (/home/acme/bin/perf)
                  495a1b handle_internal_command+0xab (/home/acme/bin/perf)
                  413473 main+0x2f3 (/home/acme/bin/perf)
            7f6b1a638088 __libc_start_call_main+0x78 (/usr/lib64/libc.so.6)
            7f6b1a63814b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
                  413ac5 _start+0x25 (/home/acme/bin/perf)
<SNIP>

So I'm guessing that running just the 'sleep' workload we are missing
the sample at some point while with the 'perf bench' we have way more
samples, etc.

Also the intermixed entries have backtraces resolved to symbols, while
the ones at the end don't have, something else to investigate.

- Arnaldo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ