lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 9 Sep 2022 21:05:20 +0900
From:   Masami Hiramatsu (Google) <mhiramat@...nel.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Tom Zanussi <zanussi@...nel.org>
Subject: Re: [PATCH v2 0/4] tracing: Have filters and histograms use a call
 table instead of pointers

On Tue, 06 Sep 2022 18:53:13 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:

> While looking at the histogram and filter code, I realized that it's filled
> with function pointers. With retpolines causing a big slowdown, I thought
> that was problematic. Thus, I decided to see what would happen if I changed
> the function pointers into enums, and instead called a single function
> that did a switch on those enums and called the necessary functions
> directly. The results were pretty clear.
> 
> The first patch was to update the trace event benchmark event to include
> a integer value "delta" of the delta that it took to complete
> (it currently only shows the delta as part of a string). By doing
> so, I could benchmark the histogram and filter logic with it.
> 
> Before this series, the histogram with a single filter (to ignore the
> first event, which has a delta of zero), had:
> 
> # event histogram
> #
> # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
> #
> 
> { delta:        129 } hitcount:       2213
> { delta:        130 } hitcount:     285965
> { delta:        131 } hitcount:    1146545
> { delta:        132 } hitcount:    5185432
> { delta:        133 } hitcount:   19896215
> { delta:        134 } hitcount:   53118616
> { delta:        135 } hitcount:   83816709
> { delta:        136 } hitcount:   68329562
> { delta:        137 } hitcount:   41859349
> { delta:        138 } hitcount:   46257797
> { delta:        139 } hitcount:   54400831
> { delta:        140 } hitcount:   72875007
> { delta:        141 } hitcount:   76193272
> { delta:        142 } hitcount:   49504263
> { delta:        143 } hitcount:   38821072
> { delta:        144 } hitcount:   47702679
> { delta:        145 } hitcount:   41357297
> { delta:        146 } hitcount:   22058238
> { delta:        147 } hitcount:    9720002
> { delta:        148 } hitcount:    3193542
> { delta:        149 } hitcount:     927030
> { delta:        150 } hitcount:     850772
> { delta:        151 } hitcount:    1477380
> { delta:        152 } hitcount:    2687977
> { delta:        153 } hitcount:    2865985
> { delta:        154 } hitcount:    1977492
> { delta:        155 } hitcount:    2475607
> { delta:        156 } hitcount:    3403612
> { delta:        157 } hitcount:    2264011
> { delta:        158 } hitcount:    1096214
> { delta:        159 } hitcount:     504653
> { delta:        160 } hitcount:     218869
> { delta:        161 } hitcount:     103246
> [..]
> 
> Where the bulk was around 142ns, and the fastest time was 129ns.
> 
> After this series:
> 
> # event histogram
> #
> # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active]
> #
> 
> { delta:        103 } hitcount:         60
> { delta:        104 } hitcount:      16966
> { delta:        105 } hitcount:     396625
> { delta:        106 } hitcount:    3223400
> { delta:        107 } hitcount:   12053754
> { delta:        108 } hitcount:   20241711
> { delta:        109 } hitcount:   14850200
> { delta:        110 } hitcount:    4946599
> { delta:        111 } hitcount:    3479315
> { delta:        112 } hitcount:   18698299
> { delta:        113 } hitcount:   62388733
> { delta:        114 } hitcount:   95803834
> { delta:        115 } hitcount:   58278130
> { delta:        116 } hitcount:   15364800
> { delta:        117 } hitcount:    5586866
> { delta:        118 } hitcount:    2346880
> { delta:        119 } hitcount:    1131091
> { delta:        120 } hitcount:     620896
> { delta:        121 } hitcount:     236652
> { delta:        122 } hitcount:     105957
> { delta:        123 } hitcount:     119107
> { delta:        124 } hitcount:      54494
> { delta:        125 } hitcount:      63856
> { delta:        126 } hitcount:      64454
> { delta:        127 } hitcount:      34818
> { delta:        128 } hitcount:      41446
> { delta:        129 } hitcount:      51242
> { delta:        130 } hitcount:      28361
> { delta:        131 } hitcount:      23926
> { delta:        132 } hitcount:      22253
> { delta:        133 } hitcount:      16994
> { delta:        134 } hitcount:      14970
> { delta:        135 } hitcount:      13464
> { delta:        136 } hitcount:      11452
> { delta:        137 } hitcount:      12212
> { delta:        138 } hitcount:      12280
> { delta:        139 } hitcount:       9127
> { delta:        140 } hitcount:       9553
> 
> Where the bulk was around 114ns and the fast time was 103ns.
> 
> That's almost a 20% speedup!!!
> 
> Changes since v1: https://lore.kernel.org/all/20220823214606.344269352@goodmis.org/
> 
>  - Fixed combining constants (Masami Hiramatsu)

This looks good to me and confirmed that the previous issue is solved.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat@...nel.org>

Thank you!

> 
> Steven Rostedt (Google) (4):
>       tracing: Add numeric delta time to the trace event benchmark
>       tracing/hist: Call hist functions directly via a switch statement
>       tracing: Move struct filter_pred into trace_events_filter.c
>       tracing/filter: Call filter predicate functions directly via a switch statement
> 
> ----
>  kernel/trace/trace.h               |  13 --
>  kernel/trace/trace_benchmark.c     |   2 +-
>  kernel/trace/trace_benchmark.h     |   8 +-
>  kernel/trace/trace_events_filter.c | 239 ++++++++++++++++++++++++-----------
>  kernel/trace/trace_events_hist.c   | 246 +++++++++++++++++++++++++------------
>  5 files changed, 343 insertions(+), 165 deletions(-)


-- 
Masami Hiramatsu (Google) <mhiramat@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ