linux-kernel - Re: [PATCH 2/4] perf: jevents: Program to convert JSON file to C style file

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150529072750.GA23124@gmail.com>
Date:	Fri, 29 May 2015 09:27:50 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Andi Kleen <ak@...ux.intel.com>
Cc:	Jiri Olsa <jolsa@...hat.com>, Namhyung Kim <namhyung@...nel.org>,
	Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...hat.com>,
	Michael Ellerman <mpe@...erman.id.au>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Paul Mackerras <paulus@...ba.org>,
	linuxppc-dev@...ts.ozlabs.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/4] perf: jevents: Program to convert JSON file to C
 style file


* Andi Kleen <ak@...ux.intel.com> wrote:

> > So instead of this flat structure, there should at minimum be broad categorization 
> > of the various parts of the hardware they relate to: whether they relate to the 
> > branch predictor, memory caches, TLB caches, memory ops, offcore, decoders, 
> > execution units, FPU ops, etc., etc. - so that they can be queried via 'perf 
> > list'.
> 
> The categorization is generally on the stem name, which already works fine with 
> the existing perf list wildcard support. So for example you only want branches.
>
> perf list br*
> ...
>   br_inst_exec.all_branches                         
>        [Speculative and retired branches]
>   br_inst_exec.all_conditional                      
>        [Speculative and retired macro-conditional branches]
>   br_inst_exec.all_direct_jmp                       
>        [Speculative and retired macro-unconditional branches excluding calls and indirects]
>   br_inst_exec.all_direct_near_call                 
>        [Speculative and retired direct near calls]
>   br_inst_exec.all_indirect_jump_non_call_ret       
>        [Speculative and retired indirect branches excluding calls and returns]
>   br_inst_exec.all_indirect_near_return             
>        [Speculative and retired indirect return branches]
> ...
> 
> Or mid level cache events:
> 
> perf list l2*
> ...
>   l2_l1d_wb_rqsts.all                               
>        [Not rejected writebacks from L1D to L2 cache lines in any state]
>   l2_l1d_wb_rqsts.hit_e                             
>        [Not rejected writebacks from L1D to L2 cache lines in E state]
>   l2_l1d_wb_rqsts.hit_m                             
>        [Not rejected writebacks from L1D to L2 cache lines in M state]
>   l2_l1d_wb_rqsts.miss                              
>        [Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.)]
>   l2_lines_in.all                                   
>        [L2 cache lines filling L2]
> ...
> 
> There are some exceptions, but generally it works this way.

You are missing my point in several ways:

1)

Firstly, there are _tons_ of 'exceptions' to the 'stem name' grouping, to the 
level that makes it unusable for high level grouping of events.

Here's the 'stem name' histogram on the SandyBridge event list:

  $ grep EventName pmu-events/arch/x86/SandyBridge_core.json  | cut -d\. -f1 | cut -d\" -f4 | cut -d\_ -f1 | sort | uniq -c | sort -n

      1 AGU
      1 BACLEARS
      1 EPT
      1 HW
      1 ICACHE
      1 INSTS
      1 PAGE
      1 ROB
      1 RS
      1 SQ
      2 ARITH
      2 DSB2MITE
      2 ILD
      2 LOAD
      2 LOCK
      2 LONGEST
      2 MISALIGN
      2 SIMD
      2 TLB
      3 CPL
      3 DSB
      3 INST
      3 INT
      3 LSD
      3 MACHINE
      4 CPU
      4 OTHER
      4 PARTIAL
      5 CYCLE
      5 ITLB
      6 LD
      7 L1D
      8 DTLB
     10 FP
     12 RESOURCE
     21 UOPS
     24 IDQ
     25 MEM
     37 BR
     37 L2
    131 OFFCORE

Out of 386 events. This grouping has the following severe problems:

  - that's 41 'stem name' groups, way too much as a first hop high level 
    structure. We want the kind of high level categorization I suggested:
    cache, decoding, branches, execution pipeline, memory events, vector unit 
    events - which broad categories exist in all CPUs and are microarchitecture 
    independent.

  - even these 'stem names' are mostly unstructured and unreadable. The two 
    examples you cited are the best case that are borderline readable, but they
    cover less than 20% of all events.

  - the 'stem name' concept is not even used consistently, the names are 
    essentially a random collection of Intel internal acronyms, which occasionally 
    match up with high level concepts. These vendor defined names have very poor 
    high level structure.

  - the 'stem names' are totally imbalanced: there's one 'super' category 'stem 
    name': OFFCORE_RESPONSE, with 131 events in it and then there are super small 
    groups in the list above. Not well suited to get a good overview about what 
    measurement capabilities the hardware has.

So forget about using 'stem names' as the high level structure. These events have 
no high level structure and we should provide that, instead of dumping 380+ events 
on the unsuspecting user.

2)

Secondly, categorization and higher level hieararchy should be used to keep the 
list manageable. The fact that if _you_ know what to search for you can list just 
a subset does not mean anything to the new user trying to discover events.

A simple 'perf list' should list the high level categories by default, with a 
count displayed that shows how many further events are within that category. 
(compacted tree output would be usable as well.)

> The stem could be put into a separate header, but it would seem redundant to me.

Higher level categories simply don't exist in these names in any usable form, so 
it has to be created. Just redundantly repeating the 'stem name' would be silly, 
as they are unusable for the purposes of high level categorization.

> > We don't just want the import the unstructured mess that these event files are 
> > - we want to turn them into real structure. We can still keep the messy vendor 
> > names as well, like IDQ.DSB_CYCLES, but we want to impose structure as well.
> 
> The vendor names directly map to the micro architecture, which is whole point of 
> the events. IDQ is a part of the CPU, and is described in the CPU manuals. One 
> of the main motivations for adding event lists is to make perf match to that 
> documentation.

Your argument is a logical fallacy: there is absolutely no conflict between also 
supporting quirky vendor names and also having good high level structure and 
naming, to make it all accessible to the first time user.

> > 3)
> > 
> > There should be good 'perf list' visualization for these events: grouping, 
> > individual names, with a good interface to query details if needed. I.e. it 
> > should be possible to browse and discover events relevant to the CPU the tool 
> > is executing on.
> 
> I suppose we could change perf list to give the stem names as section headers to 
> make the long list a bit more readable.

No, the 'stem names' are crap - instead we want to create sensible high level 
categories and want to categorize the events, I gave you a few ideas above and in 
the previous mail.

> Generally you need to have some knowledge of the micro architecture to use these 
> events. There is no way around that.

Here your argument again relies on a logical fallacy: there is absolutely no 
conflict between good high level structure, and the idea that you need to know 
about CPUs to make sense of hardware events that deal with fine internal details.

Also, you are denying the plain fact that the highest level categories _are_ 
largely microarchitecture independent: can you show me a single modern mainstream 
x86 CPU that doesn't have these broad high level categories:

  - CPU cache
  - memory accesses
  - decoding, branch execution
  - execution pipeline
  - FPU, vector units

?

There's none, and the reason is simple: the high level structure of CPUs is still 
dictated by basic physics, and physics is microarchitecture independent.

Lower level structure will inevitably be microarchitecture and sometimes even 
model specific - but that's absolutely no excuse to not have good high level 
structure.

So these are not difficult concepts at all, please make an honest effort at 
understanding then and responding to them, as properly addressing them is a 
must-have for this patch submission.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/