[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHS8izPyzJvchqFNrRjY95D=41nya8Tmvx1eS9n0ijtHcUUETA@mail.gmail.com>
Date: Mon, 16 Jun 2025 14:11:13 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Shuah Khan <shuah@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, Toke Høiland-Jørgensen <toke@...e.dk>,
Ignat Korchagin <ignat@...udflare.com>
Subject: Re: [PATCH net-next v4] page_pool: import Jesper's page_pool benchmark
On Mon, Jun 16, 2025 at 2:29 AM Jesper Dangaard Brouer <hawk@...nel.org> wrote:
> On 15/06/2025 22.59, Mina Almasry wrote:
> > From: Jesper Dangaard Brouer <hawk@...nel.org>
> >
> > We frequently consult with Jesper's out-of-tree page_pool benchmark to
> > evaluate page_pool changes.
> >
> > Import the benchmark into the upstream linux kernel tree so that (a)
> > we're all running the same version, (b) pave the way for shared
> > improvements, and (c) maybe one day integrate it with nipa, if possible.
> >
> > Import bench_page_pool_simple from commit 35b1716d0c30 ("Add
> > page_bench06_walk_all"), from this repository:
> > https://github.com/netoptimizer/prototype-kernel.git
> >
> > Changes done during upstreaming:
> > - Fix checkpatch issues.
> > - Remove the tasklet logic not needed.
> > - Move under tools/testing
> > - Create ksft for the benchmark.
> > - Changed slightly how the benchmark gets build. Out of tree, time_bench
> > is built as an independent .ko. Here it is included in
> > bench_page_pool.ko
> >
> > Steps to run:
> >
> > ```
> > mkdir -p /tmp/run-pp-bench
> > make -C ./tools/testing/selftests/net/bench
> > make -C ./tools/testing/selftests/net/bench install INSTALL_PATH=/tmp/run-pp-bench
> > rsync --delete -avz --progress /tmp/run-pp-bench mina@...RVER:~/
> > ssh mina@...RVER << EOF
> > cd ~/run-pp-bench && sudo ./test_bench_page_pool.sh
> > EOF
> > ```
> >
> > Output:
> >
> > ```
> > (benchmrk dmesg logs)
> >
>
> Something is off with benchmark numbers compared to the OOT version.
>
I assume you're comparing my results (my kernel config + my hardware +
upstream benchmark) with your results (your kernel config + your
hardware + OOT version). The problem may be in OOT vs upstream but it
may be just different code/config/hardware.
> Adding my numbers below, they were run on my testlab with:
> - CPU E5-1650 v4 @ 3.60GHz
> - kernel: net.git v6.15-12438-gd9816ec74e6d
>
> > Fast path results:
> > no-softirq-page_pool01 Per elem: 11 cycles(tsc) 4.368 ns
> >
>
> Fast-path on your CPU is faster (22 cycles(tsc) 6.128 ns) than my CPU.
> What CPU is this?
My test setup is a Gcloud A3 VM (so virtualized). The CPU is:
cat /proc/cpuinfo
...
model name : Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz
>
> Type:no-softirq-page_pool01 Per elem: 22 cycles(tsc) 6.128 ns (step:0)
> - (measurement period time:0.061282924 sec time_interval:61282924)
> - (invoke count:10000000 tsc_interval:220619745)
>
> > ptr_ring results:
> > no-softirq-page_pool02 Per elem: 527 cycles(tsc) 195.187 ns
>
> I'm surprised that ptr_ring benchmark is very slow, compared to my
> result (below) 60 cycles(tsc) 16.853 ns.
>
> Type:no-softirq-page_pool02 Per elem: 60 cycles(tsc) 16.853 ns (step:0)
> - (measurement period time:0.168535760 sec time_interval:168535760)
> - (invoke count:10000000 tsc_interval:606734160)
>
> Maybe your kernel is compiled with some CONFIG debug thing that makes it
> slower?
>
Yeah, I actually just checked and I have CONFIG_DEBUG_NET on in my
build, and a lot of other debug configs are turned on.
Let me investigate here. Maybe trimming the debug configs and double
checking my tree for debug logs I added would point to the difference.
I could also try to put both the OOT version and upstream version in
my tree and do a proper A/B comparison that way.
If you do get chance to run this upstream version from your exact tree
and config, that would be a good A/B comparison as well.
> You can troubleshoot like this:
> - select the `no-softirq-page_pool02` test via run_flags=$((2#100)).
>
> # perf record -g modprobe bench_page_pool_simple run_flags=$((2#100))
> loops=$((100*10**6))
> # perf report --no-children
>
Thanks, will do.
--
Thanks,
Mina
Powered by blists - more mailing lists