[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yj8age/PSIouUiKy@kernel.org>
Date: Sat, 26 Mar 2022 10:52:01 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Leo Yan <leo.yan@...aro.org>
Cc: Ali Saidi <alisaidi@...zon.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, german.gomez@....com,
benh@...nel.crashing.org, Nick.Forrington@....com,
alexander.shishkin@...ux.intel.com, andrew.kilroy@....com,
james.clark@....com, john.garry@...wei.com, jolsa@...nel.org,
kjain@...ux.ibm.com, lihuafei1@...wei.com, mark.rutland@....com,
mathieu.poirier@...aro.org, mingo@...hat.com, namhyung@...nel.org,
peterz@...radead.org, will@...nel.org
Subject: Re: [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse
cores
Em Sat, Mar 26, 2022 at 09:47:54PM +0800, Leo Yan escreveu:
> Hi Ali, German,
>
> On Thu, Mar 24, 2022 at 06:33:21PM +0000, Ali Saidi wrote:
>
> [...]
>
> > +static void arm_spe__synth_data_source_neoverse(const struct arm_spe_record *record,
> > + union perf_mem_data_src *data_src)
> > {
> > - union perf_mem_data_src data_src = { 0 };
> > + /*
> > + * Even though four levels of cache hierarchy are possible, no known
> > + * production Neoverse systems currently include more than three levels
> > + * so for the time being we assume three exist. If a production system
> > + * is built with four the this function would have to be changed to
> > + * detect the number of levels for reporting.
> > + */
> >
> > - if (record->op == ARM_SPE_LD)
> > - data_src.mem_op = PERF_MEM_OP_LOAD;
> > - else
> > - data_src.mem_op = PERF_MEM_OP_STORE;
>
> Firstly, apologize that I didn't give clear idea when Ali sent patch sets
> v2 and v3.
Ok, removing this as well.
Thanks for reviewing.
- Arnaldo
> IMHO, we need to consider two kinds of information which can guide us
> for a reliable implementation. The first thing is to summarize the data
> source configuration for x86 PEBS, we can dive in more details for this
> part; the second thing is we can refer to the AMBA architecture document
> ARM IHI 0050E.b, section 11.1.2 'Crossing a chip-to-chip interface' and
> its sub section 'Suggested DataSource values', which would help us
> much for mapping the cache topology to Arm SPE data source.
>
> As a result, I summarized the data source configurations for PEBS and
> Arm SPE Neoverse in the spreadsheet:
> https://docs.google.com/spreadsheets/d/11YmjG0TyRjH7IXgvsREFgTg3AVtxh2dvLloRK1EdNjU/edit?usp=sharing
>
> Please see below comments.
>
> > + switch (record->source) {
> > + case ARM_SPE_NV_L1D:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
> > + break;
>
> I think we need to set the field 'mem_snoop' for L1 cache hit:
>
> data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
>
> For L1 cache hit, it doesn't involve snooping.
>
> > + case ARM_SPE_NV_L2:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
> > + break;
>
> Ditto:
>
> data_src->mem_snoop = PERF_MEM_SNOOP_NONE;
>
> > + case ARM_SPE_NV_PEER_CORE:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
>
> Peer core contains its local L1 cache, so I think we can set the
> memory level L1 to indicate this case.
>
> For this data source type and below types, though they indicate
> the snooping happens, but it doesn't mean the data in the cache line
> is in 'modified' state. If set flag PERF_MEM_SNOOP_HITM, I personally
> think this will mislead users when report the result.
>
> I prefer we set below fields for ARM_SPE_NV_PEER_CORE:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L1;
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L1;
>
> > + break;
> > + /*
> > + * We don't know if this is L1, L2 but we do know it was a cache-2-cache
> > + * transfer, so set SNOOP_HITM
> > + */
> > + case ARM_SPE_NV_LCL_CLSTR:
>
> For ARM_SPE_NV_LCL_CLSTR, it fetches the data from the shared cache in
> the cluster level, it should happen in L2 cache:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L2;
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L2;
>
> > + case ARM_SPE_NV_PEER_CLSTR:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
> > + break;
>
> This type can snoop from L1 or L2 cache in the peer cluster, so it
> makes sense to set cache level as PERF_MEM_LVLNUM_ANY_CACHE. But here
> should use the snoop type PERF_MEM_SNOOP_HIT, so:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
>
> > + /*
> > + * System cache is assumed to be L3
> > + */
> > + case ARM_SPE_NV_SYS_CACHE:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
> > + break;
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L3;
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_L3;
>
> > + /*
> > + * We don't know what level it hit in, except it came from the other
> > + * socket
> > + */
> > + case ARM_SPE_NV_REMOTE:
> > + data_src->mem_snoop = PERF_MEM_SNOOP_HITM;
> > + data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
> > + break;
>
> The type ARM_SPE_NV_REMOTE is a snooping operation and it can happen
> in any cache levels in remote chip:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT;
> data_src->mem_snoop = PERF_MEM_SNOOP_HIT;
> data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
>
> > + case ARM_SPE_NV_DRAM:
> > + data_src->mem_lvl = PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
> > + break;
>
> We can set snoop as PERF_MEM_SNOOP_MISS for DRAM data source:
>
> data_src->mem_lvl = PERF_MEM_LVL_HIT;
> data_src->mem_snoop = PERF_MEM_SNOOP_MISS;
> data_src->mem_lvl_num = PERF_MEM_LVLNUM_RAM;
>
> The rest of this patch looks good to me.
>
> Thanks,
> Leo
>
> > + default:
> > + break;
> > + }
> > +}
> >
> > +static void arm_spe__synth_data_source_generic(const struct arm_spe_record *record,
> > + union perf_mem_data_src *data_src)
> > +{
> > if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
> > - data_src.mem_lvl = PERF_MEM_LVL_L3;
> > + data_src->mem_lvl = PERF_MEM_LVL_L3;
> >
> > if (record->type & ARM_SPE_LLC_MISS)
> > - data_src.mem_lvl |= PERF_MEM_LVL_MISS;
> > + data_src->mem_lvl |= PERF_MEM_LVL_MISS;
> > else
> > - data_src.mem_lvl |= PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl |= PERF_MEM_LVL_HIT;
> > } else if (record->type & (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS)) {
> > - data_src.mem_lvl = PERF_MEM_LVL_L1;
> > + data_src->mem_lvl = PERF_MEM_LVL_L1;
> >
> > if (record->type & ARM_SPE_L1D_MISS)
> > - data_src.mem_lvl |= PERF_MEM_LVL_MISS;
> > + data_src->mem_lvl |= PERF_MEM_LVL_MISS;
> > else
> > - data_src.mem_lvl |= PERF_MEM_LVL_HIT;
> > + data_src->mem_lvl |= PERF_MEM_LVL_HIT;
> > }
> >
> > if (record->type & ARM_SPE_REMOTE_ACCESS)
> > - data_src.mem_lvl |= PERF_MEM_LVL_REM_CCE1;
> > + data_src->mem_lvl |= PERF_MEM_LVL_REM_CCE1;
> > +}
> > +
> > +static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 midr)
> > +{
> > + union perf_mem_data_src data_src = { 0 };
> > + bool is_neoverse = is_midr_in_range(midr, neoverse_spe);
> > +
> > + if (record->op & ARM_SPE_LD)
> > + data_src.mem_op = PERF_MEM_OP_LOAD;
> > + else
> > + data_src.mem_op = PERF_MEM_OP_STORE;
> > +
> > + if (is_neoverse)
> > + arm_spe__synth_data_source_neoverse(record, &data_src);
> > + else
> > + arm_spe__synth_data_source_generic(record, &data_src);
> >
> > if (record->type & (ARM_SPE_TLB_ACCESS | ARM_SPE_TLB_MISS)) {
> > data_src.mem_dtlb = PERF_MEM_TLB_WK;
> > @@ -446,7 +525,7 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
> > u64 data_src;
> > int err;
> >
> > - data_src = arm_spe__synth_data_source(record);
> > + data_src = arm_spe__synth_data_source(record, spe->midr);
> >
> > if (spe->sample_flc) {
> > if (record->type & ARM_SPE_L1D_MISS) {
> > @@ -1183,6 +1262,8 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
> > struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info;
> > size_t min_sz = sizeof(u64) * ARM_SPE_AUXTRACE_PRIV_MAX;
> > struct perf_record_time_conv *tc = &session->time_conv;
> > + const char *cpuid = perf_env__cpuid(session->evlist->env);
> > + u64 midr = strtol(cpuid, NULL, 16);
> > struct arm_spe *spe;
> > int err;
> >
> > @@ -1202,6 +1283,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
> > spe->machine = &session->machines.host; /* No kvm support */
> > spe->auxtrace_type = auxtrace_info->type;
> > spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];
> > + spe->midr = midr;
> >
> > spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
> >
> > --
> > 2.32.0
> >
--
- Arnaldo
Powered by blists - more mailing lists