linux-kernel - Re: [PATCH 2/2] perf arm-spe: Parse more SPE fields and store source

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <46ba455e-277d-6618-f710-e92a2243898d@arm.com>
Date:   Fri, 25 Feb 2022 12:40:38 +0000
From:   German Gomez <german.gomez@....com>
To:     Ali Saidi <alisaidi@...zon.com>
Cc:     acme@...nel.org, alexander.shishkin@...ux.intel.com,
        andrew.kilroy@....com, benh@...nel.crashing.org,
        james.clark@....com, john.garry@...wei.com, jolsa@...hat.com,
        leo.yan@...aro.org, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        mark.rutland@....com, mathieu.poirier@...aro.org, mingo@...hat.com,
        namhyung@...nel.org, peterz@...radead.org, will@...nel.org
Subject: Re: [PATCH 2/2] perf arm-spe: Parse more SPE fields and store source


On 22/02/2022 19:29, Ali Saidi wrote:
> Hi German & Yan,
>
> Sorry about the delay in responding.
>
>> Hi German, Ali,
>>
> [...]
>>>>>  };
>>>>>>  
>>>>>>  enum arm_spe_op_type {
>>>>>>  	ARM_SPE_LD		= 1 << 0,
>>>>>>  	ARM_SPE_ST		= 1 << 1,
>>>>>> +	ARM_SPE_LDST_EXCL	= 1 << 2,
>>>>>> +	ARM_SPE_LDST_ATOMIC	= 1 << 3,
>>>>>> +	ARM_SPE_LDST_ACQREL	= 1 << 4,
>>> Wondering if we can store this in perf_sample->flags. The values are
>>> defined in "util/event.h" (PERF_IP_*). Maybe we can extend it to allow
>>> doing "sample->flags = PERF_LDST_FLAG_LD | PERF_LDST_FLAG_ATOMIC" and
>>> such.
>>>
>>> @Leo do you think that could work?
>> Let's step back a bit and divide the decoding flow into two parts:
>> backend and frontend.
>>
>> For the backend part, we decode the SPE hardware trace data and
>> generate the SPE record in the file
>> util/arm-spe-decoder/arm-spe-decoder.c.  As we want to support
>> complete operation types, we can extend arm_spe_op_type as below:
>>
>> enum arm_spe_op_type {
>>        /* First level operation type */
>> 	ARM_SPE_OP_OTHER        = 1 << 0,
>> 	ARM_SPE_OP_LDST		= 1 << 1,
> [...]
>
> I'm OK with this approach, but perhaps instead the op type should
> just be the raw traces op-type and op-type-payload? Macros to decode
> this information are already present and extensively used in the text
> decoding of the packet. While it's a little bit harder than just picking
> a bit, the op_type is only used in a single place today outside of
> the existing textual script decoding and what would be this decoding.
> Do we forsee many more uses that would justify having to maintain

I wanted to include some of the sve/simd bits in the perf samples.

For that I would be using a few of these flags.

> the immediate format vs finding a way to unify arm_spe_pkt_desc_op_type
> to support both the text decoding and this?
>
> [...]
>> So I am just wandering if we can set the field
>> sample::data_src::mem_lock for atomic operations, like:
>>
>>    data_src.mem_op   = PERF_MEM_OP_LOAD;
>>    data_src.mem_lock = PERF_MEM_LOCK_ATOMIC;
>>
>> The field "mem_lock" is only two bits, we can consider to extend the
>> structure with an extra filed "mem_lock_ext" if it cannot meet our
>> requirement.
> These are for the LOCK instruction on x86. I don't know that we want to
> overload the meaning here. Minimally there is value in differentiating
> exclusives vs atomics.
>
>>>>>> +	ARM_SPE_BR		= 1 << 5,
>>>>>> +	ARM_SPE_BR_COND		= 1 << 6,
>>>>>> +	ARM_SPE_BR_IND		= 1 << 7,
>>> Seems like we can store BR_COND in the existing "branch-miss" event
>>> (--itrace=b) with:
>>>
>>> sample->flags = PERF_IP_FLAG_BRANCH;
>>> sample->flags |= PERF_IP_FLAG_CONDITIONAL;
>>> and/or
>>> sample->flags |= PERF_IP_FLAG_INDIRECT;
>>>
>>> PERF_IP_FLAG_INDIRECT doesn't exist yet but we can probably add it.
>> Yes, for branch samples, this makes sense for me.
> makes sense to me too.
>
> Ali
>