lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f6aca7df-3fc6-4676-bb5b-cb15eba8f97c@gmail.com>
Date: Sun, 11 Aug 2024 13:52:50 +1000
From: Zixian Cai <fzczx123@...il.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Adrian Hunter <adrian.hunter@...el.com>,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 "Liang, Kan" <kan.liang@...ux.intel.com>, Ben Gainey <ben.gainey@....com>,
 Paran Lee <p4ranlee@...il.com>, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] perf script python: Add the ins_lat field to event
 handler

On 9/8/2024 23:36, Arnaldo Carvalho de Melo wrote:
> On Fri, Aug 09, 2024 at 08:01:36AM +0000, Zixian Cai wrote:
>> For example, when using the Alder Lake PMU memory load event, the
>> instruction latency is stored in ins_lat, while the cache latency
>> is stored in weight.
>>
>> This patch reports the ins_lat field for Python scripting.
> 
> So, how did you test this? I tried:

This is how I tested it.

My machine is running 6.5.0-41-generic from Ubuntu 22.04 LTS, and I use OS's perf to record.

$ grep -m1 'model name' /proc/cpuinfo
model name	: 12th Gen Intel(R) Core(TM) i9-12900KF

$ perf version
perf version 6.5.13

$ perf mem record taskset -c 0-15 java -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar biojava
...
Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,unlimited]
Version: biojava 7.0.2 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin biojava starting =====
Processing sequences: 100%
===== DaCapo 23.11-chopin biojava PASSED in 7988 msec =====
[ perf record: Woken up 11 times to write data ]
[ perf record: Captured and wrote 3.530 MB perf.data (47646 samples) ]

$ ./perf evlist -v
cpu_core/mem-loads-aux/: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x8203, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1
cpu_core/mem-loads,ldlat=30/: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x1cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, freq: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-loads,ldlat=30/P: type: 10, size: 136, config: 0x5d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_core/mem-stores/P: type: 4 (PERF_TYPE_RAW), size: 136, config: 0x2cd, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
cpu_atom/mem-stores/P: type: 10, size: 136, config: 0x6d0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
dummy:HG: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1

$ ./perf script -g python

Add a new method to perf-script.py

def process_event(params):
    if "cpu_core/mem-loads,ldlat" in params["ev_name"]:
        print(params["sample"]["weight"], params["sample"]["ins_lat"])

$ ./perf script|grep ldlat=|head
         taskset  182628 247517.778385:          1  cpu_core/mem-loads,ldlat=30/: ffffb33a850078a0     40268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  Addr               5              33               0 ffffffff8cc2ba08 [unknown] ([unknown])
         taskset  182628 247517.778409:          1  cpu_core/mem-loads,ldlat=30/: ffffb33a85007860     10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A                5              85               0 ffffffff8ce23476 [unknown] ([unknown])
         taskset  182628 247517.778431:          3  cpu_core/mem-loads,ldlat=30/: ffffb33a85007b78     10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A                5             163               0 ffffffff8d2061d0 [unknown] ([unknown])
         taskset  182628 247517.778444:          7  cpu_core/mem-loads,ldlat=30/: ffff90cf25b26280     10668100842 |OP LOAD|LVL L3 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A               96             120               0 ffffffff8dab2627 [unknown] ([unknown])
         taskset  182628 247517.778484:         23  cpu_core/mem-loads,ldlat=30/: ffffb33a85007cf0     10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A                5             218               0 ffffffff8cd96124 [unknown] ([unknown])
         taskset  182628 247517.778561:         39  cpu_core/mem-loads,ldlat=30/: ffffe271848b6600     20268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  Data               5             111               0 ffffffff8cd948cc [unknown] ([unknown])
         taskset  182628 247517.778629:         50  cpu_core/mem-loads,ldlat=30/: ffffe27184b6d280     11868100242 |OP LOAD|LVL LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A              71              73               0 ffffffff8cd94792 [unknown] ([unknown])
         taskset  182628 247517.778725:         67  cpu_core/mem-loads,ldlat=30/: ffff90c061ed6b48     11868100242 |OP LOAD|LVL LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK  N/A             240             242               0 ffffffff8cf9785b [unknown] ([unknown])
            java  182628 247517.778886:         81  cpu_core/mem-loads,ldlat=30/: ffffe27184888430     4026a100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK  Addr                  5              68               0 ffffffff8ce13245 [unknown] ([unknown])
            java  182628 247517.779164:         87  cpu_core/mem-loads,ldlat=30/: ffffe271bf9bca40     1026a100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK  N/A                   5              90               0 ffffffff8cd96387 [unknown] ([unknown])

$ ./perf script -s perf-script.py|head
in trace_begin
5 33
5 85
5 163
96 120
5 218
5 111
71 73
240 242
5 68

The output from the Python script matches the output of plain perf script output, showing both weight and ins_lat.

> 
> But in general try to provide the steps to show that the functionality
> that you are adding is actually working, making it easy for other
> people to try reproducing your results.

Will do for future patches.

> Thanks,
> 
> - Arnaldo

One thing I haven't figure out is that if I use perf I built from source, perf mem record doesn't seem to record the events for the Golden Cove P-cores.

$ ./perf version
perf version 6.11.0-rc2

$ ./perf mem record taskset -c 0-15 java -jar /usr/share/benchmarks/dacapo/dacapo-23.11-chopin.jar biojava

Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,unlimited]
Version: biojava 7.0.2 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin biojava starting =====
Processing sequences: 100%
===== DaCapo 23.11-chopin biojava PASSED in 7157 msec =====
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.251 MB perf.data ]

$ ./perf evlist -v
cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1

I think the above recording issue is orthogonal to this patch, and possibly a result of running 6.11 perf userland on a 6.5 kernel.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ