[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd3d594e-55e7-47c3-99f4-efc1a4688fb2@linux.dev>
Date: Wed, 31 Jan 2024 12:04:00 -0800
From: Yonghong Song <yonghong.song@...ux.dev>
To: Marco Elver <elver@...gle.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, Mykola Lysenko <mykolal@...com>,
Shuah Khan <shuah@...nel.org>, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH] bpf: Separate bpf_local_storage_lookup() fast and slow
paths
On 1/31/24 6:18 AM, Marco Elver wrote:
> To allow the compiler to inline the bpf_local_storage_lookup() fast-
> path, factor it out by making bpf_local_storage_lookup() a static inline
> function and move the slow-path to bpf_local_storage_lookup_slowpath().
>
> Base on results from './benchs/run_bench_local_storage.sh' this produces
> improvements in throughput and latency in the majority of cases:
>
> | Hashmap Control
> | ===============
> | num keys: 10
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 13.895 ± 0.024 M ops/s | 14.022 ± 0.095 M ops/s (+0.9%)
> | hits latency: 71.968 ns/op | 71.318 ns/op (-0.9%)
> | important_hits throughput: 13.895 ± 0.024 M ops/s | 14.022 ± 0.095 M ops/s (+0.9%)
> |
> | num keys: 1000
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 11.793 ± 0.018 M ops/s | 11.645 ± 0.370 M ops/s (-1.3%)
> | hits latency: 84.794 ns/op | 85.874 ns/op (+1.3%)
> | important_hits throughput: 11.793 ± 0.018 M ops/s | 11.645 ± 0.370 M ops/s (-1.3%)
> |
> | num keys: 10000
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 7.113 ± 0.012 M ops/s | 7.037 ± 0.051 M ops/s (-1.1%)
> | hits latency: 140.581 ns/op | 142.113 ns/op (+1.1%)
> | important_hits throughput: 7.113 ± 0.012 M ops/s | 7.037 ± 0.051 M ops/s (-1.1%)
> |
> | num keys: 100000
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 4.793 ± 0.034 M ops/s | 4.990 ± 0.025 M ops/s (+4.1%)
> | hits latency: 208.623 ns/op | 200.401 ns/op (-3.9%)
> | important_hits throughput: 4.793 ± 0.034 M ops/s | 4.990 ± 0.025 M ops/s (+4.1%)
> |
> | num keys: 4194304
> | hashmap (control) sequential get:
> | <before> | <after>
> | hits throughput: 2.088 ± 0.008 M ops/s | 2.962 ± 0.004 M ops/s (+41.9%)
> | hits latency: 478.851 ns/op | 337.648 ns/op (-29.5%)
> | important_hits throughput: 2.088 ± 0.008 M ops/s | 2.962 ± 0.004 M ops/s (+41.9%)
> |
> | Local Storage
> | =============
> | num_maps: 1
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 32.598 ± 0.008 M ops/s | 38.480 ± 0.054 M ops/s (+18.0%)
> | hits latency: 30.676 ns/op | 25.988 ns/op (-15.3%)
> | important_hits throughput: 32.598 ± 0.008 M ops/s | 38.480 ± 0.054 M ops/s (+18.0%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 36.963 ± 0.045 M ops/s | 43.847 ± 0.037 M ops/s (+18.6%)
> | hits latency: 27.054 ns/op | 22.807 ns/op (-15.7%)
> | important_hits throughput: 36.963 ± 0.045 M ops/s | 43.847 ± 0.037 M ops/s (+18.6%)
> |
> | num_maps: 10
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 32.078 ± 0.004 M ops/s | 37.813 ± 0.020 M ops/s (+17.9%)
> | hits latency: 31.174 ns/op | 26.446 ns/op (-15.2%)
> | important_hits throughput: 3.208 ± 0.000 M ops/s | 3.781 ± 0.002 M ops/s (+17.9%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 34.564 ± 0.011 M ops/s | 40.082 ± 0.037 M ops/s (+16.0%)
> | hits latency: 28.932 ns/op | 24.949 ns/op (-13.8%)
> | important_hits throughput: 12.344 ± 0.004 M ops/s | 14.315 ± 0.013 M ops/s (+16.0%)
> |
> | num_maps: 16
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 32.493 ± 0.023 M ops/s | 38.147 ± 0.029 M ops/s (+17.4%)
> | hits latency: 30.776 ns/op | 26.215 ns/op (-14.8%)
> | important_hits throughput: 2.031 ± 0.001 M ops/s | 2.384 ± 0.002 M ops/s (+17.4%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 34.380 ± 0.521 M ops/s | 41.605 ± 0.095 M ops/s (+21.0%)
> | hits latency: 29.087 ns/op | 24.035 ns/op (-17.4%)
> | important_hits throughput: 10.939 ± 0.166 M ops/s | 13.238 ± 0.030 M ops/s (+21.0%)
> |
> | num_maps: 17
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 28.748 ± 0.028 M ops/s | 32.248 ± 0.080 M ops/s (+12.2%)
> | hits latency: 34.785 ns/op | 31.009 ns/op (-10.9%)
> | important_hits throughput: 1.693 ± 0.002 M ops/s | 1.899 ± 0.005 M ops/s (+12.2%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 31.313 ± 0.030 M ops/s | 35.911 ± 0.020 M ops/s (+14.7%)
> | hits latency: 31.936 ns/op | 27.847 ns/op (-12.8%)
> | important_hits throughput: 9.533 ± 0.009 M ops/s | 10.933 ± 0.006 M ops/s (+14.7%)
> |
> | num_maps: 24
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 18.475 ± 0.027 M ops/s | 19.000 ± 0.006 M ops/s (+2.8%)
> | hits latency: 54.127 ns/op | 52.632 ns/op (-2.8%)
> | important_hits throughput: 0.770 ± 0.001 M ops/s | 0.792 ± 0.000 M ops/s (+2.9%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 21.361 ± 0.028 M ops/s | 22.388 ± 0.099 M ops/s (+4.8%)
> | hits latency: 46.814 ns/op | 44.667 ns/op (-4.6%)
> | important_hits throughput: 6.009 ± 0.008 M ops/s | 6.298 ± 0.028 M ops/s (+4.8%)
> |
> | num_maps: 32
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 14.220 ± 0.006 M ops/s | 14.168 ± 0.020 M ops/s (-0.4%)
> | hits latency: 70.323 ns/op | 70.580 ns/op (+0.4%)
> | important_hits throughput: 0.445 ± 0.000 M ops/s | 0.443 ± 0.001 M ops/s (-0.4%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 17.250 ± 0.011 M ops/s | 16.650 ± 0.021 M ops/s (-3.5%)
> | hits latency: 57.971 ns/op | 60.061 ns/op (+3.6%)
> | important_hits throughput: 4.815 ± 0.003 M ops/s | 4.647 ± 0.006 M ops/s (-3.5%)
> |
> | num_maps: 100
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 5.212 ± 0.012 M ops/s | 5.878 ± 0.004 M ops/s (+12.8%)
> | hits latency: 191.877 ns/op | 170.116 ns/op (-11.3%)
> | important_hits throughput: 0.052 ± 0.000 M ops/s | 0.059 ± 0.000 M ops/s (+13.5%)
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 6.521 ± 0.053 M ops/s | 7.086 ± 0.010 M ops/s (+8.7%)
> | hits latency: 153.343 ns/op | 141.116 ns/op (-8.0%)
> | important_hits throughput: 1.703 ± 0.014 M ops/s | 1.851 ± 0.003 M ops/s (+8.7%)
> |
> | num_maps: 1000
> | local_storage cache sequential get:
> | <before> | <after>
> | hits throughput: 0.357 ± 0.005 M ops/s | 0.325 ± 0.005 M ops/s (-9.0%)
> | hits latency: 2803.738 ns/op | 3076.923 ns/op (+9.7%)
> | important_hits throughput: 0.000 ± 0.000 M ops/s | 0.000 ± 0.000 M ops/s
> | local_storage cache interleaved get:
> | <before> | <after>
> | hits throughput: 0.434 ± 0.007 M ops/s | 0.447 ± 0.007 M ops/s (+3.0%)
> | hits latency: 2306.539 ns/op | 2237.687 ns/op (-3.0%)
> | important_hits throughput: 0.109 ± 0.002 M ops/s | 0.112 ± 0.002 M ops/s (+2.8%)
>
> Signed-off-by: Marco Elver <elver@...gle.com>
Thanks for the patch. This indeed could improve performance with inlining.
In Meta, we started to use LTO kernel which already have
bpf_local_storage_lookup() cross-file inlined into
bpf_sk_storage_lookup(), so we won't be able to reap
this benefit.
In the subject, please use tag [PATCH bpf-next] so CI can
pick it up for testing properly.
[...]
Powered by blists - more mailing lists