linux-kernel - Re: [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse cores

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48b2cf46-d96b-4d2d-5a56-97a88566edcc@linux.alibaba.com>
Date:   Tue, 29 Mar 2022 21:34:11 +0800
From:   Shuai Xue <xueshuai@...ux.alibaba.com>
To:     Leo Yan <leo.yan@...aro.org>, Ali Saidi <alisaidi@...zon.com>
Cc:     Nick.Forrington@....com, acme@...nel.org,
        alexander.shishkin@...ux.intel.com, andrew.kilroy@....com,
        benh@...nel.crashing.org, german.gomez@....com,
        james.clark@....com, john.garry@...wei.com, jolsa@...nel.org,
        kjain@...ux.ibm.com, lihuafei1@...wei.com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-perf-users@...r.kernel.org, mark.rutland@....com,
        mathieu.poirier@...aro.org, mingo@...hat.com, namhyung@...nel.org,
        peterz@...radead.org, will@...nel.org
Subject: Re: [PATCH v4 2/4] perf arm-spe: Use SPE data source for neoverse
 cores

Hi Leo, Ali,

Thank you for your great work and valuable discussion.

在 2022/3/27 AM3:43, Ali Saidi 写道:> Hi Leo,
> On Sat, 26 Mar 2022 21:47:54 +0800, Leo Yan wrote:
>> Hi Ali, German,
>>
>> On Thu, Mar 24, 2022 at 06:33:21PM +0000, Ali Saidi wrote:
>>
>> [...]
>>
>>> +static void arm_spe__synth_data_source_neoverse(const struct arm_spe_record *record,
>>> +						union perf_mem_data_src *data_src)
>>>  {
>>> -	union perf_mem_data_src	data_src = { 0 };
>>> +	/*
>>> +	 * Even though four levels of cache hierarchy are possible, no known
>>> +	 * production Neoverse systems currently include more than three levels
>>> +	 * so for the time being we assume three exist. If a production system
>>> +	 * is built with four the this function would have to be changed to
>>> +	 * detect the number of levels for reporting.
>>> +	 */
>>>
>>> -	if (record->op == ARM_SPE_LD)
>>> -		data_src.mem_op = PERF_MEM_OP_LOAD;
>>> -	else
>>> -		data_src.mem_op = PERF_MEM_OP_STORE;
>>
>> Firstly, apologize that I didn't give clear idea when Ali sent patch sets
>> v2 and v3.
>>
>> IMHO, we need to consider two kinds of information which can guide us
>> for a reliable implementation.  The first thing is to summarize the data
>> source configuration for x86 PEBS, we can dive in more details for this
>> part; the second thing is we can refer to the AMBA architecture document
>> ARM IHI 0050E.b, section 11.1.2 'Crossing a chip-to-chip interface' and
>> its sub section 'Suggested DataSource values', which would help us
>> much for mapping the cache topology to Arm SPE data source.
>>
>> As a result, I summarized the data source configurations for PEBS and
>> Arm SPE Neoverse in the spreadsheet:
>> https://docs.google.com/spreadsheets/d/11YmjG0TyRjH7IXgvsREFgTg3AVtxh2dvLloRK1EdNjU/edit?usp=sharing
>
> Thanks for putting this together and digging into the details, but you're making
> assumptions in neoverse data sources about the core configurations that aren't
> correct. The Neoverse cores have all have integrated L1 and L2 cache, so if the
> line is coming from a peer-core we don't know which level it's actually coming
> from.  Similarly, if it's coming from a local cluster, that could mean a cluster
> l3, but it's not the L2.

As far as I know, Neoverse N2 microarchitecture L3 Cache is non-inclusive, and L1
and L2 are strictly inclusive, like  Intel Skylake SP (SKX), i.e., the L2 may
or may not be in the L3 (no guarantee is made). That is to say, we can not tell
it is from cluster L2 or L3. Could you confirm this?

[...]


> I still think we should consider to extend the memory levels to
> demonstrate clear momory hierarchy on Arm archs, I personally like the
> definitions for "PEER_CORE", "LCL_CLSTR", "PEER_CLSTR" and "SYS_CACHE",
> though these cache levels are not precise like L1/L2/L3 levels, they can
> help us to map very well for the cache topology on Arm archs and without
> any confusion.  We could take this as an enhancement if you don't want
> to bother the current patch set's upstreaming.

Agree. In my opinion, imprecise cache levels can lead to wrong conclusions.
"PEER_CORE", "LCL_CLSTR", "PEER_CLSTR" and "SYS_CACHE" are more intuitive.

Best Regards,
Shuai