lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Sep 2021 16:36:47 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Michael Ellerman <mpe@...erman.id.au>
Cc:     Kajol Jain <kjain@...ux.ibm.com>, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, mingo@...hat.com, acme@...nel.org,
        jolsa@...nel.org, namhyung@...nel.org,
        linux-perf-users@...r.kernel.org, ak@...ux.intel.com,
        maddy@...ux.ibm.com, atrajeev@...ux.vnet.ibm.com,
        rnsastry@...ux.ibm.com, yao.jin@...ux.intel.com, ast@...nel.org,
        daniel@...earbox.net, songliubraving@...com,
        kan.liang@...ux.intel.com, mark.rutland@....com,
        alexander.shishkin@...ux.intel.com, paulus@...ba.org
Subject: Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses

On Thu, Sep 09, 2021 at 10:45:54PM +1000, Michael Ellerman wrote:

> > The 'new' composite doesnt have a hops field because the hardware that
> > nessecitated that change doesn't report it, but we could easily add a
> > field there.
> >
> > Suppose we add, mem_hops:3 (would 6 hops be too small?) and the
> > corresponding PERF_MEM_HOPS_{NA, 0..6}
> 
> It's really 7 if we use remote && hop = 0 to mean the first hop.

I don't think we can do that, becaus of backward compat. Currently:

  lvl_num=DRAM, remote=1

denites: "Remote DRAM of any distance". Effectively it would have the new
hops field filled with zeros though, so if you then decode with the hops
field added it suddenly becomes:

 lvl_num=DRAM, remote=1, hops=0

and reads like: "Remote DRAM of 0 hops" which is quite daft. Therefore 0
really must denote a 'N/A'.

> If we're wanting to use some of the hop levels to represent
> intra-chip/package hops then we could possibly use them all on a really
> big system.
> 
> eg. you could imagine something like:
> 
>  L2 | 		        - local L2
>  L2 | REMOTE | HOPS_0	- L2 of neighbour core
>  L2 | REMOTE | HOPS_1	- L2 of near core on same chip (same 1/2 of chip)
>  L2 | REMOTE | HOPS_2	- L2 of far core on same chip (other 1/2 of chip)
>  L2 | REMOTE | HOPS_3	- L2 of sibling chip in same package
>  L2 | REMOTE | HOPS_4	- L2 on separate package 1 hop away
>  L2 | REMOTE | HOPS_5	- L2 on separate package 2 hops away
>  L2 | REMOTE | HOPS_6	- L2 on separate package 3 hops away
> 
> 
> Whether it's useful to represent all those levels I'm not sure, but it's
> probably good if we have the ability.

I'm thinking we ought to keep hops as steps along the NUMA fabric, with
0 hops being the local node. That only gets us:

 L2, remote=0, hops=HOPS_0 -- our L2
 L2, remote=1, hops=HOPS_0 -- L2 on the local node but not ours
 L2, remote=1, hops!=HOPS_0 -- L2 on a remote node

> I guess I'm 50/50 on whether that's enough levels, or whether we want
> another bit to allow for future growth.

Right, possibly safer to add one extra bit while we can.... I suppose.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ