[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87k0jjl9sp.fsf@mpe.ellerman.id.au>
Date: Tue, 14 Sep 2021 20:40:38 +1000
From: Michael Ellerman <mpe@...erman.id.au>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Kajol Jain <kjain@...ux.ibm.com>, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org, mingo@...hat.com, acme@...nel.org,
jolsa@...nel.org, namhyung@...nel.org,
linux-perf-users@...r.kernel.org, ak@...ux.intel.com,
maddy@...ux.ibm.com, atrajeev@...ux.vnet.ibm.com,
rnsastry@...ux.ibm.com, yao.jin@...ux.intel.com, ast@...nel.org,
daniel@...earbox.net, songliubraving@...com,
kan.liang@...ux.intel.com, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, paulus@...ba.org
Subject: Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses
Peter Zijlstra <peterz@...radead.org> writes:
> On Thu, Sep 09, 2021 at 10:45:54PM +1000, Michael Ellerman wrote:
>
>> > The 'new' composite doesnt have a hops field because the hardware that
>> > nessecitated that change doesn't report it, but we could easily add a
>> > field there.
>> >
>> > Suppose we add, mem_hops:3 (would 6 hops be too small?) and the
>> > corresponding PERF_MEM_HOPS_{NA, 0..6}
>>
>> It's really 7 if we use remote && hop = 0 to mean the first hop.
>
> I don't think we can do that, becaus of backward compat. Currently:
>
> lvl_num=DRAM, remote=1
>
> denites: "Remote DRAM of any distance". Effectively it would have the new
> hops field filled with zeros though, so if you then decode with the hops
> field added it suddenly becomes:
>
> lvl_num=DRAM, remote=1, hops=0
>
> and reads like: "Remote DRAM of 0 hops" which is quite daft. Therefore 0
> really must denote a 'N/A'.
Ah yeah, duh, it needs to be backward compatible.
>> If we're wanting to use some of the hop levels to represent
>> intra-chip/package hops then we could possibly use them all on a really
>> big system.
>>
>> eg. you could imagine something like:
>>
>> L2 | - local L2
>> L2 | REMOTE | HOPS_0 - L2 of neighbour core
>> L2 | REMOTE | HOPS_1 - L2 of near core on same chip (same 1/2 of chip)
>> L2 | REMOTE | HOPS_2 - L2 of far core on same chip (other 1/2 of chip)
>> L2 | REMOTE | HOPS_3 - L2 of sibling chip in same package
>> L2 | REMOTE | HOPS_4 - L2 on separate package 1 hop away
>> L2 | REMOTE | HOPS_5 - L2 on separate package 2 hops away
>> L2 | REMOTE | HOPS_6 - L2 on separate package 3 hops away
>>
>>
>> Whether it's useful to represent all those levels I'm not sure, but it's
>> probably good if we have the ability.
>
> I'm thinking we ought to keep hops as steps along the NUMA fabric, with
> 0 hops being the local node. That only gets us:
>
> L2, remote=0, hops=HOPS_0 -- our L2
> L2, remote=1, hops=HOPS_0 -- L2 on the local node but not ours
> L2, remote=1, hops!=HOPS_0 -- L2 on a remote node
Hmm. I'm not sure about tying it directly to NUMA hops. I worry we're
going to see more and more systems where there's a hierarchy within the
chip/package, in addition to the traditional NUMA hierarchy.
Although then I guess it becomes a question of what exactly is a NUMA
hop, maybe the answer is that on those future systems those
intra-chip/package hops should be represented as NUMA hops.
It's not like we have a hard definition of what a NUMA hop is?
>> I guess I'm 50/50 on whether that's enough levels, or whether we want
>> another bit to allow for future growth.
>
> Right, possibly safer to add one extra bit while we can.... I suppose.
Equally it's not _that_ hard to add another bit later (if there's still
one free), makes the API a little uglier to use, but not the end of the
world.
cheers
Powered by blists - more mailing lists