linux-kernel - Re: perf mem record not getting the mem_load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <caba86ad-bcba-4e1e-acc4-b18d769db87d@linux.intel.com>
Date: Thu, 5 Sep 2024 13:23:27 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Ian Rogers <irogers@...gle.com>, Namhyung Kim <namhyung@...nel.org>,
 Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
 linux-perf-users@...r.kernel.org,
 Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: perf mem record not getting the mem_load_aux events by default



On 2024-09-04 11:34 a.m., Arnaldo Carvalho de Melo wrote:
> On Wed, Sep 04, 2024 at 11:20:57AM -0400, Liang, Kan wrote:
>>
>>
>> On 2024-09-04 10:30 a.m., Arnaldo Carvalho de Melo wrote:
>>> Hi Kan,
>>>
>>> Recently I presented about 'perf mem record' and found that I had use
>>> 'perf record' directly as 'perf mem record' on a Intel Hybrid system
>>> wasn't selecting the required aux event:
>>>
>>>   http://vger.kernel.org/~acme/prez/lsfmm-bpf-2024/#/19
>>>
>>> The previous slides show the problem and the one above shows what worked
>>> for me.
>>>
>>> I saw this while trying to fix that:
>>>
>>> Author: Kan Liang <kan.liang@...ux.intel.com>
>>> commit abbdd79b786e036e60f01b7907977943ebe7a74d
>>> Date:   Tue Jan 23 10:50:32 2024 -0800
>>>
>>>     perf mem: Clean up perf_mem_events__name()
>>>     
>>>     Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
>>>     one.
>>>     
>>>     The mem_load events may have a different format. Add ldlat and aux_event
>>>     in the struct perf_mem_event to indicate the format and the extra aux
>>>     event.
>>>     
>>>     Add perf_mem_events_intel_aux[] to support the extra mem_load_aux event.
>>>     
>>>     Rename perf_mem_events__name to perf_pmu__mem_events_name.
>>>
>>> --------------------------´
>>>
>>> So there are provisions for selecting the right events, but it doesn't
>>> seem to be working when I tried, can you take a look at what I describe
>>> on those slides and see what am I doing wrong?
>>>
>>
>> If I understand the example in the slides correctly, the issue is that
>> no mem events from big core are selected when running perf mem record,
>> rather than wrong mem events are selected.
>>
>> I don't see an obvious issue. That looks like a regression of the perf
>> mem record. I will find a Alder Lake or Raptor Lake to take a deep look.
> 
> My expectation was for whatever is needed for having those events to be
> put in place, like I did manually, and indeed, limiting it to cpu_core:
> 
> taskset -c 0 \
>   perf record --weight --data \
>               --event '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/P}:S' \
> 	      --event cpu_core/mem-stores/ find / > /dev/null
> 
> I.e. lots of boilerplate for using 'perf mem record', we should at least
> have some sort of warning about the 'perf mem record' experience having
> to be restricted to workloads running on PMUs where it can take place,
> perhaps making 'perf mem record' to restrict the CPUs used for a session
> to be the ones with the needed resources... and we have that already:
> 
> root@...ber:~# perf mem record sleep 1
> Memory events are enabled on a subset of CPUs: 16-27
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.032 MB perf.data ]
> root@...ber:~#
> 
> But...
> 
> root@...ber:~# perf evlist
> cpu_atom/mem-loads,ldlat=30/P
> cpu_atom/mem-stores/P
> dummy:u
> root@...ber:~# perf evlist -v
> cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f
> cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
> dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
> root@...ber:~# 
> 
> It is not setting up the required
> 
>   --event '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/P}:S'
> 
> part, right?
>

Right, it's bug on ADL and RPL. The p-core of ADL and RPL requires
mem-loads-aux to work around an HW defect. So there are different
mem_events for e-core and p-core, perf_mem_events_intel[] and
perf_mem_events_intel_aux[].

Ideally, perf should initialize and set the corresponding config bit for
both mem_events. However, the current code only does it for the first
PMU, which brings trouble. The second PMU (p-core) is always ignored.

Except ADL/RPL, it doesn't impact the other hybrid machine. Because the
workaround is not required. So both e-core and p-core share the same
perf_mem_events_intel[].

The patch set to fix it has been posted. Please take a look.
https://lore.kernel.org/lkml/20240905170737.4070743-1-kan.liang@linux.intel.com/

BTW: I found a regression with perf mem record -e when I did the test.
The fix patch can also be found in the above patch set.

Thanks,
Kan

> To make this more useful perhaps we should, in addition to warning that
> is running just on those CPUs, when we specify a workload (sleep 1) in
> the above case, limit that workload to that set of CPUs so that we can
> get those mem events on all of the workload runtime?
> 
> We would just add a new warning for that behaviour, etc.
> 
> - Arnaldo
>