lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8644996b-33d6-4eee-890c-f23a3c830b77@linux.intel.com>
Date: Fri, 6 Sep 2024 12:08:52 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: namhyung@...nel.org, irogers@...gle.com, jolsa@...nel.org,
 adrian.hunter@...el.com, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] perf mem: Fix missed p-core mem events on ADL and RPL



On 2024-09-06 10:17 a.m., Arnaldo Carvalho de Melo wrote:
> On Thu, Sep 05, 2024 at 03:47:03PM -0400, Liang, Kan wrote:
>> On 2024-09-05 3:33 p.m., Arnaldo Carvalho de Melo wrote:
>>> On Thu, Sep 05, 2024 at 10:07:36AM -0700, kan.liang@...ux.intel.com wrote:
>>>> From: Kan Liang <kan.liang@...ux.intel.com>
>>>>
>>>> The p-core mem events are missed when launching perf mem record on ADL
>>>> and RPL.
>>>>
>>>> root@...ber:~# perf mem record sleep 1
>>>> Memory events are enabled on a subset of CPUs: 16-27
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.032 MB perf.data ]
>>>> root@...ber:~# perf evlist
>>>> cpu_atom/mem-loads,ldlat=30/P
>>>> cpu_atom/mem-stores/P
>>>> dummy:u
>>>>
>>>> A variable 'record' in the struct perf_mem_event is to indicate whether
>>>> a mem event in a mem_events[] should be recorded. The current code only
>>>> configure the variable for the first eligible PMU. It's good enough for
>>>> a non-hybrid machine or a hybrid machine which has the same
>>>> mem_events[]. However, if a different mem_events[] is used for different
>>>> PMUs on a hybrid machine, e.g., ADL or RPL, the 'record' for the second
>>>> PMU never get a chance to be set. The mem_events[] of the second PMU
>>>> are always ignored.
>>>>
>>>> Perf mem doesn't support the per-PMU configuration now. A
>>>> per-PMU mem_events[] 'record' variable doesn't make sense. Make it
>>>> global. That could also avoid searching for the per-PMU mem_events[]
>>>> via perf_pmu__mem_events_ptr every time.
>>>>
>>>> Fixes: abbdd79b786e ("perf mem: Clean up perf_mem_events__name()")
>>>> Reported-by: Arnaldo Carvalho de Melo <acme@...nel.org>
>>>> Closes: https://lore.kernel.org/lkml/Zthu81fA3kLC2CS2@x1/
>>>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
>>>
>>> Looks better:
>>>
>>> root@...ber:~# perf report --header-only | grep 'cmdline\|event'
>>> # cmdline : /home/acme/bin/perf mem record ls 
>>> # event : name = cpu_atom/mem-loads,ldlat=30/P, , id = { 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511 }, type = 10 (cpu_atom), size = 136, config = 0x5d0 (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, { bp_addr, config1 } = 0x1f
>>> # event : name = cpu_atom/mem-stores/P, , id = { 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523 }, type = 10 (cpu_atom), size = 136, config = 0x6d0 (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
>>> # event : name = cpu_core/mem-loads-aux/, , id = { 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539 }, type = 4 (cpu_core), size = 136, config = 0x8203 (mem-loads-aux), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1
>>> # event : name = cpu_core/mem-loads,ldlat=30/, , id = { 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556 }, type = 4 (cpu_core), size = 136, config = 0x1cd (mem-loads), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, freq = 1, precise_ip = 2, sample_id_all = 1, exclude_guest = 1, { bp_addr, config1 } = 0x1f
>>> # event : name = cpu_core/mem-stores/P, , id = { 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572 }, type = 4 (cpu_core), size = 136, config = 0x2cd (mem-stores), { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, freq = 1, enable_on_exec = 1, precise_ip = 3, sample_id_all = 1
>>> # event : name = dummy:u, , id = { 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600 }, type = 1 (software), size = 136, config = 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|ADDR|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, inherit = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1, task = 1, mmap_data = 1, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
>>> # intel_pt pmu capabilities: topa_multiple_entries=1, psb_cyc=1, single_range_output=1, mtc_periods=249, ip_filtering=1, output_subsys=0, cr3_filtering=1, psb_periods=3f, event_trace=0, cycle_thresholds=3f, power_event_trace=0, mtc=1, payloads_lip=0, ptwrite=1, num_address_ranges=2, max_subleaf=1, topa_output=1, tnt_disable=0
>>> root@...ber:~# perf evlist
>>> cpu_atom/mem-loads,ldlat=30/P
>>> cpu_atom/mem-stores/P
>>> cpu_core/mem-loads-aux/
>>> cpu_core/mem-loads,ldlat=30/
>>> cpu_core/mem-stores/P
>>> dummy:u
>>> root@...ber:~#
>>>
>>> But can we reconstruct the events relationship (group, :S, etc) from
>>> what we have in the perf.data header?
>>>
>>
>> Do you mean show the group relation in the perf evlist?
>>
>> $perf mem record sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.027 MB perf.data (10 samples) ]
>>
>> $perf evlist -g
>> cpu_atom/mem-loads,ldlat=30/P
>> cpu_atom/mem-stores/P
>> {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
>> cpu_core/mem-stores/P
>> dummy:u
>>
>> The -g option already did it, although the group modifier looks lost.
> 
> Right, I can reproduce that, but I wonder if we shouldn't make this '-g'
> option the default?

I think the evlist means a list of events. Only outputting the events
makes sense to me.
With -g, the extra relationship information is provided.

> 
> -----
> 
> Committer testing:
> 
>   root@...ber:~# perf evlist -g
>   cpu_atom/mem-loads,ldlat=30/P
>   cpu_atom/mem-stores/P
>   {cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}
>   cpu_core/mem-stores/P
>   dummy:u
>   root@...ber:~#
> 
> The :S for '{cpu_core/mem-loads-aux/,cpu_core/mem-loads,ldlat=30/}' is
> not being added by 'perf evlist -g', to be checked.
> 
> -----

It should be a generic issue, not just for perf evlist -g.

The same issue can be observed for perf report.
$perf report --header-only | grep 'cmdline\|group'
# cmdline : /home/kan/tmp/perf-tools-next/tools/perf/perf record -e
{cycles,instructions}:u sleep 1
# group: {cycles,instructions}

I think it's because the per-group modifiers is converted to per-event
modifiers and stored in the evsel when parsing the group. It's hard to
reconstruct the accurate group strings only relying on the evsel, unless
we record the group string somewhere, e.g., leader evsel, when parsing it.

Thanks,
Kan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ