lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 21 Feb 2022 20:41:43 +0000
From:   German Gomez <german.gomez@....com>
To:     Leo Yan <leo.yan@...aro.org>, Ali Saidi <alisaidi@...zon.com>
Cc:     acme@...nel.org, alexander.shishkin@...ux.intel.com,
        andrew.kilroy@....com, benh@...nel.crashing.org,
        james.clark@....com, john.garry@...wei.com, jolsa@...hat.com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-perf-users@...r.kernel.org, mark.rutland@....com,
        mathieu.poirier@...aro.org, mingo@...hat.com, namhyung@...nel.org,
        peterz@...radead.org, will@...nel.org
Subject: Re: [PATCH 2/2] perf arm-spe: Parse more SPE fields and store source

Hi Leo, Ali,

On 12/02/2022 04:19, Leo Yan wrote:
> Hi German, Ali,
>
> On Fri, Feb 11, 2022 at 04:31:40PM +0000, German Gomez wrote:
>> Hi Ali,
>>
>> [...]
> Let's step back a bit and divide the decoding flow into two parts:
> backend and frontend.

Sorry for derailing the conversation.

(I made some additional comments on the generation of samples below)

> enum arm_spe_op_type {
>         /* First level operation type */
> 	ARM_SPE_OP_OTHER        = 1 << 0,
> 	ARM_SPE_OP_LDST		= 1 << 1,
> 	ARM_SPE_OP_BRANCH_ERET  = 1 << 2,
>
>         /* Second level operation type for OTHER */
>         ARM_SPE_OP_SVE_OTHER    = 1 << 16,
>         ARM_SPE_OP_SVE_FP       = 1 << 17,
>         ARM_SPE_OP_SVE_PRED     = 1 << 18,
>
>         /* Second level operation type for LDST */
>         ARM_SPE_OP_LD           = 1 << 16,
>         ARM_SPE_OP_ST           = 1 << 17,
>         ARM_SPE_OP_ATOMIC       = 1 << 18,
>         ARM_SPE_OP_EXCL         = 1 << 19,
>         ARM_SPE_OP_AR           = 1 << 20,
>         ARM_SPE_OP_SIMD_FP      = 1 << 21,
>         ARM_SPE_OP_GP_REG       = 1 << 22,
>         ARM_SPE_OP_UNSPEC_REG   = 1 << 23,
>         ARM_SPE_OP_NV_SYSREG    = 1 << 24,
>         ARM_SPE_OP_SVE_PRED     = 1 << 25,
>         ARM_SPE_OP_SVE_SG       = 1 << 26,
>
>         /* Second level operation type for BRANCH_ERET */
>         ARM_SPE_OP_BR_COND      = 1 << 16,
>         ARM_SPE_OP_BR_INDIRECT  = 1 << 17,
> };
>
> IIUC, Ali suggested to directly reuse packet format to express
> operation type and don't need to redefine the operation types like
> above structure arm_spe_op_type.  I personally bias to convert the raw
> packet format to more readable format, a benefit is this would give
> us more readable code.

I personally like this method as well

>
> For the frontend, we need to convert Arm SPE record to samples.
> We can explore two fields: sample::flags and sample::data_src, for
> load/store related operations, I perfer we can fill more complete
> info in field sample::data_src and extend the definitions for
> perf_mem_data_src; for branch operations, we can fill sample::flags.
>
> So I am just wandering if we can set the field
> sample::data_src::mem_lock for atomic operations, like:
>
>     data_src.mem_op   = PERF_MEM_OP_LOAD;
>     data_src.mem_lock = PERF_MEM_LOCK_ATOMIC;
>
> The field "mem_lock" is only two bits, we can consider to extend the
> structure with an extra filed "mem_lock_ext" if it cannot meet our
> requirement.

Indeed it makes more sense to use data_src as much as possible. Thanks
for pointing that out.

Some comments:

# ARM_SPE_OP_ATOMIC

  This might be a hack, but can we not represent it as both LD&SR as the
  atomic op would combine both?

  data_src.mem_op = PERF_MEM_OP_LOAD | PERF_MEM_OP_STORE;

# ARM_SPE_OP_EXCL (instructions ldxr/stxr)

  x86 doesn't seem to have similar instructions with similar semantics
  (please correct me if I'm wrong). For this arch, PERF_MEM_LOCK_LOCK
  probably suffices.

  PPC seems to have similar instructions to arm64 (lwarx/stwcx). I don't
  know if they also have instructions with same semantics as x86.

  I think it makes sense to have a PERF_MEM_LOCK_EXCL. If not, reusing
  PERF_MEM_LOCK_LOCK is the quicker alternative.

# ARM_SPE_OP_SVE_SG

  (I'm sorry if this is too far out of scope of the original patch. Let
  me know if you would prefer to discuss it on a separate channel)

  On a separate note, I'm also looking at incorporating some of the SVE
  bits in the perf samples.
 
  For this, do you think it makes sense to have two mem_* categories in
  perf_mem_data_src:

  mem_vector (2 bits)
    - simd
    - other (SVE in arm64)

  mem_src (1 bit)
    - sparse (scatter/gather loads/stores in SVE, as well as simd)

---
Thanks,
German

>>>>> +	ARM_SPE_BR		= 1 << 5,
>>>>> +	ARM_SPE_BR_COND		= 1 << 6,
>>>>> +	ARM_SPE_BR_IND		= 1 << 7,
>> Seems like we can store BR_COND in the existing "branch-miss" event
>> (--itrace=b) with:
>>
>> sample->flags = PERF_IP_FLAG_BRANCH;
>> sample->flags |= PERF_IP_FLAG_CONDITIONAL;
>> and/or
>> sample->flags |= PERF_IP_FLAG_INDIRECT;
>>
>> PERF_IP_FLAG_INDIRECT doesn't exist yet but we can probably add it.
> Yes, for branch samples, this makes sense for me.
>
> Thanks,
> Leo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ