[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aC_axwJjoYIsRjer@x1>
Date: Thu, 22 May 2025 23:17:43 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>,
Weilin Wang <weilin.wang@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Ravi Bangoria <ravi.bangoria@....com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] perf pmu intel: Adjust cpumaks for sub-NUMA clusters
on graniterapids
On Tue, May 20, 2025 at 12:45:24PM -0400, Liang, Kan wrote:
> On 2025-05-19 10:00 p.m., Namhyung Kim wrote:
> > On Sun, May 18, 2025 at 10:45:52AM -0700, Ian Rogers wrote:
> >> On Sun, May 18, 2025 at 10:09 AM Namhyung Kim <namhyung@...nel.org> wrote:
> >> I think we're agreeing. I wonder that the intent of the aggregation
> >> number is to make it so that you can work out an average from the
> >> aggregated count. So for core PMUs you divide the count by the
> >> aggregation number and get the average count per core (CPU?). If we're
> >> getting an aggregated count of say uncore memory controller events
> >> then it would make sense to me that we show the aggregated total and
> >> the aggregation count is the number of memory controller PMUs, so we
> >> can have an average per memory controller. This should line up with
> >> using the number of file descriptors.
> > Sounds right.
> >> I think this isn't the current behavior, on perf v6.12:
> >> ```
> >> $ sudo perf stat --per-socket -e data_read -a sleep 1
> >>
> >> Performance counter stats for 'system wide':
> >>
> >> S0 1 2,484.96 MiB data_read
> >>
> >> 1.001365319 seconds time elapsed
> >>
> >> $ sudo perf stat -A -e data_read -a sleep 1
> >>
> >> Performance counter stats for 'system wide':
> >>
> >> CPU0 1,336.48 MiB data_read [uncore_imc_free_running_0]
> >> CPU0 1,337.06 MiB data_read [uncore_imc_free_running_1]
> >>
> >> 1.001049096 seconds time elapsed
> >> ```
> >> so the aggregation number shows 1 but 2 events were aggregated together.
> >
> > Ugh.. right. Merging uncore PMU instances can add more confusion. :(
> >
> >>
> >> I think computing the aggregation number in the stat code is probably
> >> wrong. The value should be constant for an evsel and aggr_cpu_id, it's
> >> just computing it for an aggr_cpu_id is a pain due to needing topology
> >> and/or PMU information. The code is ripe for refactoring. I'd prefer
> >> not to do it as part of this change though which is altering a
> >> particular Intel Granite Rapids issue.
> >
> > That's ok. Just one more TODO items..
> Sounds good to me as well.
> For this patch, I've verified it with SNC-2. The rest looks good to me.
Thanks, applied to perf-tools-next, will go public tomorrow, after I
redo tests, off to bed now...
- Arnaldo
Powered by blists - more mailing lists