netdev - Re: [PATCH bpf-next 4/5] selftests/bpf: Add benchmark for bpf_csum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <mb61pttd2bzks.fsf@kernel.org>
Date: Wed, 23 Oct 2024 15:37:07 +0000
From: Puranjay Mohan <puranjay@...nel.org>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Albert Ou <aou@...s.berkeley.edu>, Alexei Starovoitov <ast@...nel.org>,
 Andrew Morton <akpm@...ux-foundation.org>, Andrii Nakryiko
 <andrii@...nel.org>, bpf@...r.kernel.org, Daniel Borkmann
 <daniel@...earbox.net>, "David S. Miller" <davem@...emloft.net>, "Eduard
 Zingerman" <eddyz87@...il.com>, Eric Dumazet <edumazet@...gle.com>, Hao Luo
 <haoluo@...gle.com>, Helge Deller <deller@....de>, Jakub Kicinski
 <kuba@...nel.org>, "James E.J. Bottomley"
 <James.Bottomley@...senpartnership.com>, Jiri Olsa <jolsa@...nel.org>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 linux-kernel@...r.kernel.org, linux-parisc@...r.kernel.org,
 linux-riscv@...ts.infradead.org, Martin KaFai Lau <martin.lau@...ux.dev>,
 Mykola Lysenko <mykolal@...com>, netdev@...r.kernel.org, Palmer Dabbelt
 <palmer@...belt.com>, Paolo Abeni <pabeni@...hat.com>, Paul Walmsley
 <paul.walmsley@...ive.com>, Shuah Khan <shuah@...nel.org>, Song Liu
 <song@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Yonghong Song
 <yonghong.song@...ux.dev>
Subject: Re: [PATCH bpf-next 4/5] selftests/bpf: Add benchmark for
 bpf_csum_diff() helper

Puranjay Mohan <puranjay@...nel.org> writes:

> Andrii Nakryiko <andrii.nakryiko@...il.com> writes:
>
>> On Tue, Oct 22, 2024 at 3:21 AM Puranjay Mohan <puranjay@...nel.org> wrote:
>>>
>>> Andrii Nakryiko <andrii.nakryiko@...il.com> writes:
>>>
>>> > On Mon, Oct 21, 2024 at 5:22 AM Puranjay Mohan <puranjay@...nel.org> wrote:
>>> >>
>>> >> Add a microbenchmark for bpf_csum_diff() helper. This benchmark works by
>>> >> filling a 4KB buffer with random data and calculating the internet
>>> >> checksum on different parts of this buffer using bpf_csum_diff().
>>> >>
>>> >> Example run using ./benchs/run_bench_csum_diff.sh on x86_64:
>>> >>
>>> >> [bpf]$ ./benchs/run_bench_csum_diff.sh
>>> >> 4                    2.296 ± 0.066M/s (drops 0.000 ± 0.000M/s)
>>> >> 8                    2.320 ± 0.003M/s (drops 0.000 ± 0.000M/s)
>>> >> 16                   2.315 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>>> >> 20                   2.318 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>>> >> 32                   2.308 ± 0.003M/s (drops 0.000 ± 0.000M/s)
>>> >> 40                   2.300 ± 0.029M/s (drops 0.000 ± 0.000M/s)
>>> >> 64                   2.286 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>>> >> 128                  2.250 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>>> >> 256                  2.173 ± 0.001M/s (drops 0.000 ± 0.000M/s)
>>> >> 512                  2.023 ± 0.055M/s (drops 0.000 ± 0.000M/s)
>>> >
>>> > you are not benchmarking bpf_csum_diff(), you are benchmarking how
>>> > often you can call bpf_prog_test_run(). Add some batching on the BPF
>>> > side, these numbers tell you that there is no difference between
>>> > calculating checksum for 4 bytes and for 512, that didn't seem strange
>>> > to you?
>>>
>>> This didn't seem strange to me because if you see the tables I added to
>>> the cover letter, there is a clear improvement after optimizing the
>>> helper and arm64 even shows a linear drop going from 4 bytes to 512
>>> bytes, even after the optimization.
>>>
>>
>> Regardless of optimization, it's strange that throughput barely
>> differs when you vary the amount of work by more than 100x. This
>> wouldn't be strange if this checksum calculation was some sort of
>> cryptographic hash, where it's intentional to have the same timing
>> regardless of amount of work, or something along those lines. But I
>> don't think that's the case here.
>>
>> But as it is right now, this benchmark is benchmarking
>> bpf_prog_test_run(), as I mentioned, which seems to be bottlenecking
>> at about 2mln/s throughput for your machine. bpf_csum_diff()'s
>> overhead is trivial compared to bpf_prog_test_run() overhead and
>> syscall/context switch overhead.
>>
>> We shouldn't add the benchmark that doesn't benchmark the right thing.
>> So just add a bpf_for(i, 0, 100) loop doing bpf_csum_diff(), and then
>> do atomic increment *after* the loop (to minimize atomics overhead).
>
> Thanks, now I undestand what you meant. Will add the bpf_for() in the
> next version.

I have decided to drop this patch as even after adding bpf_for() the
difference between 4B and 512B is not that much. So, benchmarking
bpf_csum_diff() using this triggering based framework is not useful.

So, v2 will not have this patch but the cover letter will still have the
tables to show the difference before/after the optimization.

Thanks,
Puranjay

Download attachment "signature.asc" of type "application/pgp-signature" (256 bytes)