[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <58782dd2-95d1-655d-7355-4170d2c56f05@amd.com>
Date: Thu, 13 Apr 2023 22:29:22 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Ian Rogers <irogers@...gle.com>
Cc: linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
acme@...nel.org, peterz@...radead.org, mingo@...hat.com,
mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, namhyung@...nel.org, ravi.bangoria@....com,
sandipan.das@....com, ananth.narayan@....com,
gautham.shenoy@....com, eranian@...gle.com, puwen@...on.cn
Subject: Re: [RFC PATCH v3 2/2] perf: Add option for --per-cache aggregation
Hello Ian,
Thank you for reviewing the series.
On 4/13/2023 10:18 PM, Ian Rogers wrote:
> On Wed, Apr 12, 2023 at 11:22 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>>
>> Processors based on chiplet architecture, such as AMD EPYC and Hygon do
>> not expose the chiplet details in the sysfs CPU topology information.
>> However, this information can be derived from the per CPU cache level
>> information from the sysfs.
>>
>> perf stat has already supported aggregation based on topology
>> information using core ID, socket ID, etc. It'll be useful to aggregate
>> based on the cache topology to detect problems like imbalance and
>> cache-to-cache sharing at various cache levels.
>>
>> This patch adds support for "--per-cache" option for aggregation at a
>> particular cache level. Also update the docs and related test. The
>> output will be like:
>>
>> $ sudo perf stat --per-cache=L3 -a -e ls_dmnd_fills_from_sys.ext_cache_remote -- sleep 5
>>
>> Performance counter stats for 'system wide':
>>
>> S0-D0-L3-ID0 16 12,644,599 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID8 16 13,847,598 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID16 16 223,592 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID24 16 7,775 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID32 16 31,302 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID40 16 17,722 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID48 16 8,272 ls_dmnd_fills_from_sys.ext_cache_remote
>> S0-D0-L3-ID56 16 5,765 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID64 16 18,227,173 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID72 16 20,926,878 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID80 16 13,705 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID88 16 24,062 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID96 16 27,891 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID104 16 13,480 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID112 16 12,656 ls_dmnd_fills_from_sys.ext_cache_remote
>> S1-D1-L3-ID120 16 21,881 ls_dmnd_fills_from_sys.ext_cache_remote
>>
>> Also support perf stat record and perf stat report with the ability to
>> specify a different cache level to aggregate data at when running perf
>> stat report.
>>
>> Suggested-by: Gautham R. Shenoy <gautham.shenoy@....com>
>> Signed-off-by: K Prateek Nayak <kprateek.nayak@....com>
>> ---
>> Changelog:
>> o v2->v3
>> - Cache IDs are now derived from the shared_cpus_list making
>> aggregation possible on older kernels that do not expose the IDs.
>> - Updated the comments based on the new ID assignment strategy.
>> - Better handle the case when specifying a level is possible as it is
>> less than or equal to MAX_CACHE_LVL but it does not exist on the
>> machine. In such cased ID will be -1.
>>
>> $ sudo perf stat --per-cache=L4 -a -e cycles -- sleep 5
>>
>> Performance counter stats for 'system wide':
>>
>> S0-D0-L4-ID-1 128 51,328,613 cycles
>> S1-D1-L4-ID-1 128 125,132,221 cycles
>>
>> o v1->v2
>> - Fix cache level parsing logic. Previously, giving "--per-cache=2" was
>> considered valid but that was not the intention. Only parse strings
>> of form LX or lX where x is the cache level and can range from 1 to
>> MAX_CACHE_LVL.
>> ---
>> tools/lib/perf/include/perf/cpumap.h | 5 +
>> tools/lib/perf/include/perf/event.h | 3 +-
>> tools/perf/Documentation/perf-stat.txt | 16 ++
>> tools/perf/builtin-stat.c | 144 +++++++++++++++++-
>> .../tests/shell/lib/perf_json_output_lint.py | 4 +-
>> tools/perf/tests/shell/stat+csv_output.sh | 14 ++
>> tools/perf/tests/shell/stat+json_output.sh | 13 ++
>
> I think you can break this patch apart. You can add the feature, then
> enable the CSV test, then the json test, etc. When adding the feature
> you can also probably split the changes of "struct aggr_cpu_id" from
> the display/enablement logic.
I agree this patch is huge. I'll break it down as per your suggestion in
the next version where I also plan to drop the RFC tag.
> Overall it looks good!
Thank you again for taking a look at the series.
>
> Thanks,
> Ian
>
>> [..snip..]
>>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists