linux-kernel - [PATCH bpf-next] bpf/test_run: increase Page Pool's ptr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240214153838.4159970-1-aleksander.lobakin@intel.com>
Date: Wed, 14 Feb 2024 16:38:38 +0100
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Andrii Nakryiko <andrii@...nel.org>
Cc: Alexander Lobakin <aleksander.lobakin@...el.com>,
	Toke Høiland-Jørgensen <toke@...hat.com>,
	Martin KaFai Lau <martin.lau@...ux.dev>,
	Jakub Kicinski <kuba@...nel.org>,
	Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
	bpf@...r.kernel.org,
	netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH bpf-next] bpf/test_run: increase Page Pool's ptr_ring size in live frames mode

Currently, when running xdp-trafficgen, test_run creates page_pools with
the ptr_ring size of %NAPI_POLL_WEIGHT (64).
This might work fine if XDP Tx queues are polled with the budget
limitation. However, we often clear them with no limitation to ensure
maximum free space when sending.
For example, in ice and idpf (upcoming), we use "lazy" cleaning, i.e. we
clean XDP Tx queue only when the free space there is less than 1/4 of
the queue size. Let's take the ring size of 512 just as an example. 3/4
of the ring is 384 and often times, when we're entering the cleaning
function, we have this whole amount ready (or 256 or 192, doesn't
matter).
Then we're calling xdp_return_frame_bulk() and after 64th frame,
page_pool_put_page_bulk() starts returning pages to the page allocator
due to that the ptr_ring is already full. put_page(), alloc_page() et at
starts consuming a ton of CPU time and leading the board of the perf top
output.

Let's not limit ptr_ring to 64 for no real reason and allow more pages
to be recycled. Just don't put anything to page_pool_params::size and
let the Page Pool core pick the default of 1024 entries (I don't believe
there are real use cases to clean more than that amount of descriptors).
After the change, the MM layer disappears from the perf top output and
all pages get recycled to the PP. On my test setup on idpf with the
default ring size (512), this gives +80% of Tx performance with no
visible memory consumption increase.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@...el.com>
---
 net/bpf/test_run.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index dfd919374017..1ad4f1ddcb88 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -163,7 +163,6 @@ static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_c
 	struct page_pool_params pp_params = {
 		.order = 0,
 		.flags = 0,
-		.pool_size = xdp->batch_size,
 		.nid = NUMA_NO_NODE,
 		.init_callback = xdp_test_run_init_page,
 		.init_arg = xdp,
-- 
2.43.0