[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7cae74d6-dfb1-35a7-7dd4-69958e577d0c@linux.intel.com>
Date: Tue, 17 May 2022 09:43:28 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Ian Rogers <irogers@...gle.com>
Cc: acme@...nel.org, mingo@...hat.com, jolsa@...nel.org,
namhyung@...nel.org, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, peterz@...radead.org,
zhengjun.xing@...ux.intel.com, adrian.hunter@...el.com,
ak@...ux.intel.com, eranian@...gle.com
Subject: Re: [PATCH V2 1/4] perf evsel: Fixes topdown events in a weak group
for the hybrid platform
On 5/16/2022 10:52 PM, Ian Rogers wrote:
> On Mon, May 16, 2022 at 8:25 AM <kan.liang@...ux.intel.com> wrote:
>>
>> From: Kan Liang <kan.liang@...ux.intel.com>
>>
>> The patch ("perf evlist: Keep topdown counters in weak group") fixes the
>> perf metrics topdown event issue when the topdown events are in a weak
>> group on a non-hybrid platform. However, it doesn't work for the hybrid
>> platform.
>>
>> $./perf stat -e '{cpu_core/slots/,cpu_core/topdown-bad-spec/,
>> cpu_core/topdown-be-bound/,cpu_core/topdown-fe-bound/,
>> cpu_core/topdown-retiring/,cpu_core/branch-instructions/,
>> cpu_core/branch-misses/,cpu_core/bus-cycles/,cpu_core/cache-misses/,
>> cpu_core/cache-references/,cpu_core/cpu-cycles/,cpu_core/instructions/,
>> cpu_core/mem-loads/,cpu_core/mem-stores/,cpu_core/ref-cycles/,
>> cpu_core/cache-misses/,cpu_core/cache-references/}:W' -a sleep 1
>>
>> Performance counter stats for 'system wide':
>>
>> 751,765,068 cpu_core/slots/ (84.07%)
>> <not supported> cpu_core/topdown-bad-spec/
>> <not supported> cpu_core/topdown-be-bound/
>> <not supported> cpu_core/topdown-fe-bound/
>> <not supported> cpu_core/topdown-retiring/
>> 12,398,197 cpu_core/branch-instructions/ (84.07%)
>> 1,054,218 cpu_core/branch-misses/ (84.24%)
>> 539,764,637 cpu_core/bus-cycles/ (84.64%)
>> 14,683 cpu_core/cache-misses/ (84.87%)
>> 7,277,809 cpu_core/cache-references/ (77.30%)
>> 222,299,439 cpu_core/cpu-cycles/ (77.28%)
>> 63,661,714 cpu_core/instructions/ (84.85%)
>> 0 cpu_core/mem-loads/ (77.29%)
>> 12,271,725 cpu_core/mem-stores/ (77.30%)
>> 542,241,102 cpu_core/ref-cycles/ (84.85%)
>> 8,854 cpu_core/cache-misses/ (76.71%)
>> 7,179,013 cpu_core/cache-references/ (76.31%)
>>
>> 1.003245250 seconds time elapsed
>>
>> A hybrid platform has a different PMU name for the core PMUs, while
>> the current perf hard code the PMU name "cpu".
>>
>> The evsel->pmu_name can be used to replace the "cpu" to fix the issue.
>> For a hybrid platform, the pmu_name must be non-NULL. Because there are
>> at least two core PMUs. The PMU has to be specified.
>> For a non-hybrid platform, the pmu_name may be NULL. Because there is
>> only one core PMU, "cpu". For a NULL pmu_name, we can safely assume that
>> it is a "cpu" PMU.
>>
>> In case other PMUs also define the "slots" event, checking the PMU type
>> as well.
>>
>> With the patch,
>>
>> $perf stat -e '{cpu_core/slots/,cpu_core/topdown-bad-spec/,
>> cpu_core/topdown-be-bound/,cpu_core/topdown-fe-bound/,
>> cpu_core/topdown-retiring/,cpu_core/branch-instructions/,
>> cpu_core/branch-misses/,cpu_core/bus-cycles/,cpu_core/cache-misses/,
>> cpu_core/cache-references/,cpu_core/cpu-cycles/,cpu_core/instructions/,
>> cpu_core/mem-loads/,cpu_core/mem-stores/,cpu_core/ref-cycles/,
>> cpu_core/cache-misses/,cpu_core/cache-references/}:W' -a sleep 1
>>
>> Performance counter stats for 'system wide':
>>
>> 766,620,266 cpu_core/slots/ (84.06%)
>> 73,172,129 cpu_core/topdown-bad-spec/ # 9.5% bad speculation (84.06%)
>> 193,443,341 cpu_core/topdown-be-bound/ # 25.0% backend bound (84.06%)
>> 403,940,929 cpu_core/topdown-fe-bound/ # 52.3% frontend bound (84.06%)
>> 102,070,237 cpu_core/topdown-retiring/ # 13.2% retiring (84.06%)
>> 12,364,429 cpu_core/branch-instructions/ (84.03%)
>> 1,080,124 cpu_core/branch-misses/ (84.24%)
>> 564,120,383 cpu_core/bus-cycles/ (84.65%)
>> 36,979 cpu_core/cache-misses/ (84.86%)
>> 7,298,094 cpu_core/cache-references/ (77.30%)
>> 227,174,372 cpu_core/cpu-cycles/ (77.31%)
>> 63,886,523 cpu_core/instructions/ (84.87%)
>> 0 cpu_core/mem-loads/ (77.31%)
>> 12,208,782 cpu_core/mem-stores/ (77.31%)
>> 566,409,738 cpu_core/ref-cycles/ (84.87%)
>> 23,118 cpu_core/cache-misses/ (76.71%)
>> 7,212,602 cpu_core/cache-references/ (76.29%)
>>
>> 1.003228667 seconds time elapsed
>>
>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
>> ---
>> tools/perf/arch/x86/util/evsel.c | 21 +++++++++++++++++++--
>> 1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
>> index 00cb4466b4ca..6eda5a491eda 100644
>> --- a/tools/perf/arch/x86/util/evsel.c
>> +++ b/tools/perf/arch/x86/util/evsel.c
>> @@ -31,10 +31,27 @@ void arch_evsel__fixup_new_cycles(struct perf_event_attr *attr)
>> free(env.cpuid);
>> }
>>
>> +static bool evsel__sys_has_perf_metrics(const struct evsel *evsel)
>> +{
>
> Why have this and not use topdown_sys_has_perf_metrics? It seems
> strange to have the mix of evsel and PMU that this function is
> testing.
This function tells whether the evsel's PMU supports perf metrics.
The topdown_sys_has_perf_metrics() tells whether there is a PMU on the
machine which supports perf metrics.
I don't think we want to group the topdown events on the cpu_atom.
The evsel__sys_has_perf_metrics() can filter out the cpu_atom case.
I will add comments for the two functions to clarify.
>
>> + const char *pmu_name = evsel->pmu_name ? evsel->pmu_name : "cpu";
>> +
>> + /*
>> + * The PERF_TYPE_RAW type is the core PMU type.
>> + * The slots event is only available for the core PMU, which
>> + * supports the perf metrics feature.
>
> nit: Does "core PMU" mean /sys/devices/cpu_core ? It would be good to
> disambiguate possibly by just using "cpu_core PMU".
No. The core PMU is to be distinguished with the uncore PMU.
For a non-hybrid machine, the core PMU is the "cpu" PMU.
For a hybrid machine, the core PMU is the "cpu_core" PMU.
I don't think we want to specify the "cpu_core" PMU here, because the
function should work for both hybrid and non-hybrid platforms.
Thanks,
Kan
>
> Thanks,
> Ian
>
>
>> + * Checking both the PERF_TYPE_RAW type and the slots event
>> + * should be good enough to detect the perf metrics feature.
>> + */
>> + if ((evsel->core.attr.type == PERF_TYPE_RAW) &&
>> + pmu_have_event(pmu_name, "slots"))
>> + return true;
>> +
>> + return false;
>> +}
>> +
>> bool arch_evsel__must_be_in_group(const struct evsel *evsel)
>> {
>> - if ((evsel->pmu_name && strcmp(evsel->pmu_name, "cpu")) ||
>> - !pmu_have_event("cpu", "slots"))
>> + if (!evsel__sys_has_perf_metrics(evsel))
>> return false;
>>
>> return evsel->name &&
>> --
>> 2.35.1
>>
Powered by blists - more mailing lists