lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 13 Apr 2023 22:29:22 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Ian Rogers <irogers@...gle.com>
Cc:     linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        acme@...nel.org, peterz@...radead.org, mingo@...hat.com,
        mark.rutland@....com, alexander.shishkin@...ux.intel.com,
        jolsa@...nel.org, namhyung@...nel.org, ravi.bangoria@....com,
        sandipan.das@....com, ananth.narayan@....com,
        gautham.shenoy@....com, eranian@...gle.com, puwen@...on.cn
Subject: Re: [RFC PATCH v3 2/2] perf: Add option for --per-cache aggregation

Hello Ian,

Thank you for reviewing the series.

On 4/13/2023 10:18 PM, Ian Rogers wrote:
> On Wed, Apr 12, 2023 at 11:22 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>>
>> Processors based on chiplet architecture, such as AMD EPYC and Hygon do
>> not expose the chiplet details in the sysfs CPU topology information.
>> However, this information can be derived from the per CPU cache level
>> information from the sysfs.
>>
>> perf stat has already supported aggregation based on topology
>> information using core ID, socket ID, etc. It'll be useful to aggregate
>> based on the cache topology to detect problems like imbalance and
>> cache-to-cache sharing at various cache levels.
>>
>> This patch adds support for "--per-cache" option for aggregation at a
>> particular cache level. Also update the docs and related test. The
>> output will be like:
>>
>>   $ sudo perf stat --per-cache=L3 -a -e ls_dmnd_fills_from_sys.ext_cache_remote -- sleep 5
>>
>>    Performance counter stats for 'system wide':
>>
>>   S0-D0-L3-ID0             16         12,644,599      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S0-D0-L3-ID8             16         13,847,598      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S0-D0-L3-ID16            16            223,592      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S0-D0-L3-ID24            16              7,775      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S0-D0-L3-ID32            16             31,302      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S0-D0-L3-ID40            16             17,722      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S0-D0-L3-ID48            16              8,272      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S0-D0-L3-ID56            16              5,765      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S1-D1-L3-ID64            16         18,227,173      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S1-D1-L3-ID72            16         20,926,878      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S1-D1-L3-ID80            16             13,705      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S1-D1-L3-ID88            16             24,062      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S1-D1-L3-ID96            16             27,891      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S1-D1-L3-ID104           16             13,480      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S1-D1-L3-ID112           16             12,656      ls_dmnd_fills_from_sys.ext_cache_remote
>>   S1-D1-L3-ID120           16             21,881      ls_dmnd_fills_from_sys.ext_cache_remote
>>
>> Also support perf stat record and perf stat report with the ability to
>> specify a different cache level to aggregate data at when running perf
>> stat report.
>>
>> Suggested-by: Gautham R. Shenoy <gautham.shenoy@....com>
>> Signed-off-by: K Prateek Nayak <kprateek.nayak@....com>
>> ---
>> Changelog:
>> o v2->v3
>>   - Cache IDs are now derived from the shared_cpus_list making
>>     aggregation possible on older kernels that do not expose the IDs.
>>   - Updated the comments based on the new ID assignment strategy.
>>   - Better handle the case when specifying a level is possible as it is
>>     less than or equal to MAX_CACHE_LVL but it does not exist on the
>>     machine. In such cased ID will be -1.
>>
>>       $ sudo perf stat --per-cache=L4 -a -e cycles -- sleep 5
>>
>>         Performance counter stats for 'system wide':
>>
>>         S0-D0-L4-ID-1           128         51,328,613      cycles
>>         S1-D1-L4-ID-1           128        125,132,221      cycles
>>
>> o v1->v2
>>   - Fix cache level parsing logic. Previously, giving "--per-cache=2" was
>>     considered valid but that was not the intention. Only parse strings
>>     of form LX or lX where x is the cache level and can range from 1 to
>>     MAX_CACHE_LVL.
>> ---
>>  tools/lib/perf/include/perf/cpumap.h          |   5 +
>>  tools/lib/perf/include/perf/event.h           |   3 +-
>>  tools/perf/Documentation/perf-stat.txt        |  16 ++
>>  tools/perf/builtin-stat.c                     | 144 +++++++++++++++++-
>>  .../tests/shell/lib/perf_json_output_lint.py  |   4 +-
>>  tools/perf/tests/shell/stat+csv_output.sh     |  14 ++
>>  tools/perf/tests/shell/stat+json_output.sh    |  13 ++
> 
> I think you can break this patch apart. You can add the feature, then
> enable the CSV test, then the json test, etc. When adding the feature
> you can also probably split the changes of "struct aggr_cpu_id" from
> the display/enablement logic.

I agree this patch is huge. I'll break it down as per your suggestion in
the next version where I also plan to drop the RFC tag.

> Overall it looks good!

Thank you again for taking a look at the series.

> 
> Thanks,
> Ian
> 
>>  [..snip..]
>>

--
Thanks and Regards,
Prateek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ