lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed,  7 Apr 2021 10:44:52 +0800
From:   Jin Yao <yao.jin@...ux.intel.com>
To:     acme@...nel.org, jolsa@...nel.org, peterz@...radead.org,
        mingo@...hat.com, alexander.shishkin@...ux.intel.com
Cc:     Linux-kernel@...r.kernel.org, ak@...ux.intel.com,
        kan.liang@...el.com, yao.jin@...el.com,
        Jin Yao <yao.jin@...ux.intel.com>
Subject: [PATCH] perf report: Fix wrong LBR block sorting

When '--total-cycles' is specified, it supports sorting for all blocks
by 'Sampled Cycles%'. This is useful to concentrate on the globally
hottest blocks.

'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles

But in current code, it doesn't use the cycles aggregation. Part of 'cycles'
counting is possibly dropped for some overlap jumps. But for identifying
the hot block, we always need the full cycles.

  # perf record -b ./triad_loop
  # perf report --total-cycles --stdio

Before:

  #
  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
  # ...............  ..............  ...........  ..........  ......................................................................  ....................
  #
              0.81%             793        4.32%         793                                    [setup-vdso.h:34 -> setup-vdso.h:40]            ld-2.27.so
              0.49%             480        0.87%         160                             [native_write_msr+0 -> native_write_msr+16]     [kernel.kallsyms]
              0.48%             476        0.52%          95                               [native_read_msr+0 -> native_read_msr+29]     [kernel.kallsyms]
              0.31%             303        1.65%         303                                       [nmi_restore+0 -> nmi_restore+37]     [kernel.kallsyms]
              0.26%             255        1.39%         255               [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162]     [kernel.kallsyms]
              0.24%             234        1.28%         234                                [end_repeat_nmi+67 -> end_repeat_nmi+83]     [kernel.kallsyms]
              0.23%             227        1.24%         227                     [__irqentry_text_end+96 -> __irqentry_text_end+126]     [kernel.kallsyms]
              0.20%             194        1.06%         194                      [native_set_debugreg+52 -> native_set_debugreg+56]     [kernel.kallsyms]
              0.11%             106        0.14%          26                         [native_sched_clock+0 -> native_sched_clock+98]     [kernel.kallsyms]
              0.10%              97        0.53%          97                     [trigger_load_balance+0 -> trigger_load_balance+67]     [kernel.kallsyms]
              0.09%              85        0.46%          85                      [get-dynamic-info.h:102 -> get-dynamic-info.h:111]            ld-2.27.so
  ...
              0.00%           92.7K        0.02%           4                                    [triad_loop.c:64 -> triad_loop.c:65]            triad_loop

The hottest block '[triad_loop.c:64 -> triad_loop.c:65]' is not at
the top of output.

After:

  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles                                                   [Program Block Range]         Shared Object
  # ...............  ..............  ...........  ..........  ......................................................................  ....................
  #
             94.35%           92.7K        0.02%           4                                    [triad_loop.c:64 -> triad_loop.c:65]            triad_loop
              0.81%             793        4.32%         793                                    [setup-vdso.h:34 -> setup-vdso.h:40]            ld-2.27.so
              0.49%             480        0.87%         160                             [native_write_msr+0 -> native_write_msr+16]     [kernel.kallsyms]
              0.48%             476        0.52%          95                               [native_read_msr+0 -> native_read_msr+29]     [kernel.kallsyms]
              0.31%             303        1.65%         303                                       [nmi_restore+0 -> nmi_restore+37]     [kernel.kallsyms]
              0.26%             255        1.39%         255               [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162]     [kernel.kallsyms]
              0.24%             234        1.28%         234                                [end_repeat_nmi+67 -> end_repeat_nmi+83]     [kernel.kallsyms]
              0.23%             227        1.24%         227                     [__irqentry_text_end+96 -> __irqentry_text_end+126]     [kernel.kallsyms]
              0.20%             194        1.06%         194                      [native_set_debugreg+52 -> native_set_debugreg+56]     [kernel.kallsyms]
              0.11%             106        0.14%          26                         [native_sched_clock+0 -> native_sched_clock+98]     [kernel.kallsyms]
              0.10%              97        0.53%          97                     [trigger_load_balance+0 -> trigger_load_balance+67]     [kernel.kallsyms]
              0.09%              85        0.46%          85                      [get-dynamic-info.h:102 -> get-dynamic-info.h:111]            ld-2.27.so
              0.08%              82        0.06%          11          [intel_pmu_drain_pebs_nhm+580 -> intel_pmu_drain_pebs_nhm+627]     [kernel.kallsyms]
              0.08%              77        0.42%          77                          [lru_add_drain_cpu+0 -> lru_add_drain_cpu+133]     [kernel.kallsyms]
              0.08%              74        0.10%          18                        [handle_pmi_common+271 -> handle_pmi_common+310]     [kernel.kallsyms]
              0.08%              74        0.40%          74                      [get-dynamic-info.h:131 -> get-dynamic-info.h:157]            ld-2.27.so
              0.07%              69        0.09%          17          [intel_pmu_drain_pebs_nhm+432 -> intel_pmu_drain_pebs_nhm+468]     [kernel.kallsyms]

Now the hottest block is reported at the top of output.

Fixes: b65a7d372b1a ("perf hist: Support block formats with compare/sort/display")
Signed-off-by: Jin Yao <yao.jin@...ux.intel.com>
---
 tools/perf/util/block-info.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 423ec69bda6c..5ecd4f401f32 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -201,7 +201,7 @@ static int block_total_cycles_pct_entry(struct perf_hpp_fmt *fmt,
 	double ratio = 0.0;
 
 	if (block_fmt->total_cycles)
-		ratio = (double)bi->cycles / (double)block_fmt->total_cycles;
+		ratio = (double)bi->cycles_aggr / (double)block_fmt->total_cycles;
 
 	return color_pct(hpp, block_fmt->width, 100.0 * ratio);
 }
@@ -216,9 +216,9 @@ static int64_t block_total_cycles_pct_sort(struct perf_hpp_fmt *fmt,
 	double l, r;
 
 	if (block_fmt->total_cycles) {
-		l = ((double)bi_l->cycles /
+		l = ((double)bi_l->cycles_aggr /
 			(double)block_fmt->total_cycles) * 100000.0;
-		r = ((double)bi_r->cycles /
+		r = ((double)bi_r->cycles_aggr /
 			(double)block_fmt->total_cycles) * 100000.0;
 		return (int64_t)l - (int64_t)r;
 	}
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ