linux-kernel - Re: [PATCHv4 00/57] perf c2c: Add new tool to analyze cacheline contention on NUMA systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <36cb0bb2-0a58-c5bc-98f5-d9b25b84439d@redhat.com>
Date:   Sat, 1 Oct 2016 09:44:06 -0400
From:   Joe Mario <jmario@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>, Jiri Olsa <jolsa@...nel.org>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Michael Trapp <michael.trapp@....com>,
        "Long, Wai Man" <waiman.long@....com>,
        Stanislav Ievlev <stanislav.ievlev@...il.com>,
        Kim Phillips <kim.phillips@....com>,
        lkml <linux-kernel@...r.kernel.org>,
        Don Zickus <dzickus@...hat.com>,
        Ingo Molnar <mingo@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        David Ahern <dsahern@...il.com>,
        Andi Kleen <andi@...stfloor.org>,
        Stephane Eranian <eranian@...gle.com>,
        Robert Hundt <rhundt@...gle.com>
Subject: Re: [PATCHv4 00/57] perf c2c: Add new tool to analyze cacheline
 contention on NUMA systems

On 09/29/2016 05:19 AM, Peter Zijlstra wrote:

>
> What I want is a tool that maps memop events (any PEBS memops) back to a
> 'type::member' form and sorts on that. That doesn't rely on the PEBS
> 'Data Linear Address' field, as that is useless for dynamically
> allocated bits. Instead it would use the IP and Dwarf information to
> deduce the 'type::member' of the memop.
>
> I want pahole like output, showing me where the hits (green) and misses
> (red) are in a structure.

I agree that would give valuable insight, but it needs to be
in addition to what this c2c provides today, and not a replacement for.

Ten years ago Robert Hundt created that pahole-style output as a developer option
to the HP-UX compiler.  It used compiler feedback to compute every struct
accessed by the application, with exact counts for all reads and writes to
every struct member.  It even had affinity information to show how often
field members were accessed together in time.

He and I ran it on numerous large applications.  It was awesome, but it
did fall short in a few places that Jiri's c2c patches provide, such as
being able to:

- distinguish where the concurrent cacheline accesses came from (e.g, which
   cores, and which nodes).

- see where the loads got resolved from, (local cache, local memory, remote
   cache, remote memory).

- see if the hot structs were cacheline aligned or not.

- see if more than one hot struct shares a cachline.

- see how costly, via load latencies, the contention is.

- see, among all the accesses to a cachline, which thread or process is
   causing the most harm.

- insight into how many other threads/processes are contending for a
   cacheline (and who they are).

The above info has been critical to understanding how best to tackle the
contention uncovered for all those who have used the "perf c2c" prototype.

So yes, the pahole-style addition would be a plus and it would make it easier
to map it back to the struct, but make sure to preserve what the current
"perfc2c" provides that the pahole-style output will not.

Joe