lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 4 Jan 2021 10:09:38 +0800
From:   Leo Yan <leo.yan@...aro.org>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Ian Rogers <irogers@...gle.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Joe Mario <jmario@...hat.com>, David Ahern <dsahern@...il.com>,
        Don Zickus <dzickus@...hat.com>, Al Grant <Al.Grant@....com>,
        James Clark <james.clark@....com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads

On Sun, Jan 03, 2021 at 11:52:19PM +0100, Jiri Olsa wrote:
> On Sun, Dec 13, 2020 at 01:38:39PM +0000, Leo Yan wrote:
> > This patch set is to sort cache line for all load operations which hit
> > any cache levels.  For single cache line view, it shows the load
> > references for loads with cache hits and with cache misses respectively.
> > 
> > This series is a following for the old patch set "perf c2c: Sort
> > cacheline with LLC load" [1], in the old patch set it tries to sort
> > cache line with the load operations in last level cache (LLC), after
> > testing we found the trace data doesn't contain LLC events if the
> > platform isn't a NUMA system.  For this reason, this series refines the
> > implementation to sort on all cache levels hits of load operations; it's
> > reasonable for us to review the load and store opreations, if detects
> > any cache line is accessed by multi-threads, this hints that the cache
> > line is possible for false sharing.
> > 
> > This patch set is clearly applied on perf/core branch with the latest
> > commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
> > evlist' record methods").  And the changes has been tested on x86 and
> > Arm64, the testing result is shown as below.
> 
> SNIP
> 
> > 
> > 
> >   [...]
> > 
> > Changes from v1:
> > * Changed from sorting on LLC to sorting on all loads with cache hits;
> > * Added patches 06/11, 07/11 for refactoring macros;
> > * Added patch 08/11 for refactoring node header, so can display "%loads"
> >   rather than "%hitms" in the header;
> > * Added patch 09/11 to add local pointers for pointing to output metrics
> >   string and sort string (Juri);
> > * Added warning in percent_hitm() for the display "all", which should
> >   never happen (Juri).
> > 
> > [1] https://lore.kernel.org/patchwork/cover/1321514/
> > 
> > 
> > Leo Yan (11):
> >   perf c2c: Add dimensions for total load hit
> >   perf c2c: Add dimensions for load hit
> >   perf c2c: Add dimensions for load miss
> >   perf c2c: Rename for shared cache line stats
> >   perf c2c: Refactor hist entry validation
> >   perf c2c: Refactor display filter macro
> >   perf c2c: Refactor node display macro
> >   perf c2c: Refactor node header
> >   perf c2c: Add local variables for output metrics
> >   perf c2c: Sort on all cache hit for load operations
> >   perf c2c: Update documentation for display option 'all'
> > 
> >  tools/perf/Documentation/perf-c2c.txt |  21 +-
> >  tools/perf/builtin-c2c.c              | 548 ++++++++++++++++++++++----
> >  2 files changed, 487 insertions(+), 82 deletions(-)
> 
> Joe might want to test it first, but it looks all good to me:
> 
> Acked-by: Jiri Olsa <jolsa@...hat.com>

Thanks for the review, Jiri.

Note, after testing with Arm SPE, we found the store operations don't
contain the information for L1 cache hit or miss, this leads to there
have no statistics for "st_l1hit" and "st_l1miss"; finally the single
cache line view only can show the load samples and fails to show store
opreations due to the empty statistics for "st_l1hit" and "st_l1miss".

This is related the hardware issue, after some discussion internally,
so far cannot find a easy way to set memory flag for L1 cache hit or
miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
PERF_MEM_LVL_MISS for store's L1 cache accessing).

Given it is uncertain for this issue, please hold on for this patch
series and I will resend if have any conclusion.

And really sorry I notify this too late.

Thanks,
Leo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ