[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f6aca7df-3fc6-4676-bb5b-cb15eba8f97c@gmail.com>
Date: Sun, 11 Aug 2024 13:52:50 +1000
From: Zixian Cai <fzczx123@...il.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Adrian Hunter <adrian.hunter@...el.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>, Ben Gainey <ben.gainey@....com>,
Paran Lee <p4ranlee@...il.com>, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] perf script python: Add the ins_lat field to event
handler
On 9/8/2024 23:36, Arnaldo Carvalho de Melo wrote:
> On Fri, Aug 09, 2024 at 08:01:36AM +0000, Zixian Cai wrote:
>> For example, when using the Alder Lake PMU memory load event, the
>> instruction latency is stored in ins_lat, while the cache latency
>> is stored in weight.
>>
>> This patch reports the ins_lat field for Python scripting.
>
> So, how did you test this? I tried:
This is how I tested it.
My machine is running 6.5.0-41-generic from Ubuntu 22.04 LTS, and I use OS's perf to record.
$ grep -m1 'model name' /proc/cpuinfo
model name : 12th Gen Intel(R) Core(TM) i9-12900KF
$ perf version
perf version 6.5.13
$ perf mem record taskset -c 0-15 java -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar biojava
...
Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,unlimited]
Version: biojava 7.0.2 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin biojava starting =====
Processing sequences: 100%
===== DaCapo 23.11-chopin biojava PASSED in 7988 msec =====
[ perf record: Woken up 11 times to write data ]
[ perf record: Captured and wrote 3.530 MB perf.data (47646 samples) ]
$ ./perf evlist -v
cpu_core/mem-loads-aux/: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x8203, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1
cpu_core/mem-loads,ldlat=30/: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, freq: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-loads,ldlat=30/P: type: 10, size: 136, config: 0x5d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_core/mem-stores/P: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x2cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
cpu_atom/mem-stores/P: type: 10, size: 136, config: 0x6d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
dummy:HG: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
$ ./perf script -g python
Add a new method to perf-script.py
def process_event(params):
if "cpu_core/mem-loads,ldlat" in params["ev_name"]:
print(params["sample"]["weight"], params["sample"]["ins_lat"])
$ ./perf script|grep ldlat=|head
taskset 182628 247517.778385: 1 cpu_core/mem-loads,ldlat=30/: ffffb33a850078a0 40268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK Addr 5 33 0 ffffffff8cc2ba08 [unknown] ([unknown])
taskset 182628 247517.778409: 1 cpu_core/mem-loads,ldlat=30/: ffffb33a85007860 10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 85 0 ffffffff8ce23476 [unknown] ([unknown])
taskset 182628 247517.778431: 3 cpu_core/mem-loads,ldlat=30/: ffffb33a85007b78 10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 163 0 ffffffff8d2061d0 [unknown] ([unknown])
taskset 182628 247517.778444: 7 cpu_core/mem-loads,ldlat=30/: ffff90cf25b26280 10668100842 |OP LOAD|LVL L3 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 96 120 0 ffffffff8dab2627 [unknown] ([unknown])
taskset 182628 247517.778484: 23 cpu_core/mem-loads,ldlat=30/: ffffb33a85007cf0 10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 218 0 ffffffff8cd96124 [unknown] ([unknown])
taskset 182628 247517.778561: 39 cpu_core/mem-loads,ldlat=30/: ffffe271848b6600 20268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK Data 5 111 0 ffffffff8cd948cc [unknown] ([unknown])
taskset 182628 247517.778629: 50 cpu_core/mem-loads,ldlat=30/: ffffe27184b6d280 11868100242 |OP LOAD|LVL LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 71 73 0 ffffffff8cd94792 [unknown] ([unknown])
taskset 182628 247517.778725: 67 cpu_core/mem-loads,ldlat=30/: ffff90c061ed6b48 11868100242 |OP LOAD|LVL LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 240 242 0 ffffffff8cf9785b [unknown] ([unknown])
java 182628 247517.778886: 81 cpu_core/mem-loads,ldlat=30/: ffffe27184888430 4026a100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK Addr 5 68 0 ffffffff8ce13245 [unknown] ([unknown])
java 182628 247517.779164: 87 cpu_core/mem-loads,ldlat=30/: ffffe271bf9bca40 1026a100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK N/A 5 90 0 ffffffff8cd96387 [unknown] ([unknown])
$ ./perf script -s perf-script.py|head
in trace_begin
5 33
5 85
5 163
96 120
5 218
5 111
71 73
240 242
5 68
The output from the Python script matches the output of plain perf script output, showing both weight and ins_lat.
>
> But in general try to provide the steps to show that the functionality
> that you are adding is actually working, making it easy for other
> people to try reproducing your results.
Will do for future patches.
> Thanks,
>
> - Arnaldo
One thing I haven't figure out is that if I use perf I built from source, perf mem record doesn't seem to record the events for the Golden Cove P-cores.
$ ./perf version
perf version 6.11.0-rc2
$ ./perf mem record taskset -c 0-15 java -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar biojava
Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,unlimited]
Version: biojava 7.0.2 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin biojava starting =====
Processing sequences: 100%
===== DaCapo 23.11-chopin biojava PASSED in 7157 msec =====
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.251 MB perf.data ]
$ ./perf evlist -v
cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
I think the above recording issue is orthogonal to this patch, and possibly a result of running 6.11 perf userland on a 6.5 kernel.
Powered by blists - more mailing lists