[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87cybsr72w.fsf@toke.dk>
Date: Wed, 28 May 2025 19:14:15 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Mina Almasry <almasrymina@...gle.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org, Jesper
Dangaard Brouer <hawk@...nel.org>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski
<kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
<horms@...nel.org>, Shuah Khan <shuah@...nel.org>, Ilias Apalodimas
<ilias.apalodimas@...aro.org>
Subject: Re: [PATCH RFC net-next v2] page_pool: import Jesper's page_pool
benchmark
Arnaldo Carvalho de Melo <acme@...nel.org> writes:
> On Wed, May 28, 2025 at 11:28:54AM +0200, Toke Høiland-Jørgensen wrote:
>> Mina Almasry <almasrymina@...gle.com> writes:
>> > On Mon, May 26, 2025 at 5:51 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>> >> Back when you posted the first RFC, Jesper and I chatted about ways to
>> >> avoid the ugly "load module and read the output from dmesg" interface to
>> >> the test.
>
>> > I agree the existing interface is ugly.
>
>> >> One idea we came up with was to make the module include only the "inner"
>> >> functions for the benchmark, and expose those to BPF as kfuncs. Then the
>> >> test runner can be a BPF program that runs the tests, collects the data
>> >> and passes it to userspace via maps or a ringbuffer or something. That's
>> >> a nicer and more customisable interface than the printk output. And if
>> >> they're small enough, maybe we could even include the functions into the
>> >> page_pool code itself, instead of in a separate benchmark module?
>
>> >> WDYT of that idea? :)
>
>> > ...but this sounds like an enormous amount of effort, for something
>> > that is a bit ugly but isn't THAT bad. Especially for me, I'm not that
>> > much of an expert that I know how to implement what you're referring
>> > to off the top of my head. I normally am open to spending time but
>> > this is not that high on my todolist and I have limited bandwidth to
>> > resolve this :(
>
>> > I also feel that this is something that could be improved post merge.
>
> agreed
>
>> > I think it's very beneficial to have this merged in some form that can
>> > be improved later. Byungchul is making a lot of changes to these mm
>> > things and it would be nice to have an easy way to run the benchmark
>> > in tree and maybe even get automated results from nipa. If we could
>> > agree on mvp that is appropriate to merge without too much scope creep
>> > that would be ideal from my side at least.
>
>> Right, fair. I guess we can merge it as-is, and then investigate whether
>> we can move it to BPF-based (or maybe 'perf bench' - Cc acme) later :)
>
> tldr; I'd advise to merge it as-is, then kfunc'ify parts of it and use
> it from a 'perf bench' suite.
>
> Yeah, the model would be what I did for uprobes, but even then there is
> a selftests based uprobes benchmark ;-)
>
> The 'perf bench' part, that calls into the skel:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/bench/uprobe.c
>
> The skel:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf_skel/bench_uprobe.bpf.c
>
> While this one is just to generate BPF load to measure the impact on
> uprobes, for your case it would involve using a ring buffer to
> communicate from the skel (BPF/kernel side) to the userspace part,
> similar to what is done in various other BPF based perf tooling
> available in:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/bpf_skel
>
> Like at this line (BPF skel part):
>
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/bpf_skel/off_cpu.bpf.c?h=perf-tools-next#n253
>
> The simplest part is in the canonical, standalone runqslower tool, also
> hosted in the kernel sources:
>
> BPF skel sending stuff to userspace:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/bpf/runqslower/runqslower.bpf.c#n99
>
> The userspace part that reads it:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/bpf/runqslower/runqslower.c#n90
>
> This is a callback that gets called for every event that the BPF skel
> produces, called from this loop:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/bpf/runqslower/runqslower.c#n162
>
> That handle_event callback was associated via:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/bpf/runqslower/runqslower.c#n153
>
> There is a dissection I did about this process a long time ago, but
> still relevant, I think:
>
> http://oldvger.kernel.org/~acme/bpf/devconf.cz-2020-BPF-The-Status-of-BTF-producers-consumers/#/33
>
> The part explaining the interaction userspace/kernel starts here:
>
> http://oldvger.kernel.org/~acme/bpf/devconf.cz-2020-BPF-The-Status-of-BTF-producers-consumers/#/40
>
> (yeah, its http, but then, its _old_vger ;-)
>
> Doing it in perf is interesting because it gets widely packaged, so
> whatever you add to it gets visibility for people using 'perf bench' and
> also gets available in most places, it would add to this collection:
>
> root@...ber:~# perf bench
> Usage:
> perf bench [<common options>] <collection> <benchmark> [<options>]
>
> # List of all available benchmark collections:
>
> sched: Scheduler and IPC benchmarks
> syscall: System call benchmarks
> mem: Memory access benchmarks
> numa: NUMA scheduling and MM benchmarks
> futex: Futex stressing benchmarks
> epoll: Epoll stressing benchmarks
> internals: Perf-internals benchmarks
> breakpoint: Breakpoint benchmarks
> uprobe: uprobe benchmarks
> all: All benchmarks
>
> root@...ber:~#
>
> the 'perf bench' that uses BPF skel:
>
> root@...ber:~# perf bench uprobe baseline
> # Running 'uprobe/baseline' benchmark:
> # Executed 1,000 usleep(1000) calls
> Total time: 1,050,383 usecs
>
> 1,050.383 usecs/op
> root@...ber:~# perf trace --summary perf bench uprobe trace_printk
> # Running 'uprobe/trace_printk' benchmark:
> # Executed 1,000 usleep(1000) calls
> Total time: 1,053,082 usecs
>
> 1,053.082 usecs/op
>
> Summary of events:
>
> uprobe-trace_pr (1247691), 3316 events, 96.9%
>
> syscall calls errors total min avg max stddev
> (msec) (msec) (msec) (msec) (%)
> --------------- -------- ------ -------- --------- --------- --------- ------
> clock_nanosleep 1000 0 1101.236 1.007 1.101 50.939 4.53%
> close 98 0 32.979 0.001 0.337 32.821 99.52%
> perf_event_open 1 0 18.691 18.691 18.691 18.691 0.00%
> mmap 209 0 0.567 0.001 0.003 0.007 2.59%
> bpf 38 2 0.380 0.000 0.010 0.092 28.38%
> openat 65 0 0.171 0.001 0.003 0.012 7.14%
> mprotect 56 0 0.141 0.001 0.003 0.008 6.86%
> read 68 0 0.082 0.001 0.001 0.010 11.60%
> fstat 65 0 0.056 0.001 0.001 0.003 5.40%
> brk 10 0 0.050 0.001 0.005 0.012 24.29%
> pread64 8 0 0.042 0.001 0.005 0.021 49.29%
> <SNIP other syscalls>
>
> root@...ber:~#
Cool, thanks for the pointers! Guess we'd need to restructure the
functions to be benchmarked a bit, but that should be doable I guess.
-Toke
Powered by blists - more mailing lists