[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14011562.uLZWGnKmhe@7950hx>
Date: Fri, 16 Jan 2026 22:50:36 +0800
From: Menglong Dong <menglong.dong@...ux.dev>
To: Qiliang Yuan <realwujing@...il.com>
Cc: memxor@...il.com, andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
daniel@...earbox.net, eddyz87@...il.com, haoluo@...gle.com,
john.fastabend@...il.com, jolsa@...nel.org, kpsingh@...nel.org,
linux-kernel@...r.kernel.org, martin.lau@...ux.dev, realwujing@...com,
sdf@...ichev.me, song@...nel.org, yonghong.song@...ux.dev,
yuanql9@...natelecom.cn, Alexei Starovoitov <alexei.starovoitov@...il.com>,
Qiliang Yuan <realwujing@...il.com>
Subject:
Re: [PATCH v2] bpf/verifier: implement slab cache for verifier state list
On 2026/1/16 21:29, Qiliang Yuan wrote:
> The BPF verifier's state exploration logic in is_state_visited() frequently
> allocates and deallocates 'struct bpf_verifier_state_list' nodes. Currently,
> these allocations use generic kzalloc(), which leads to significant memory
> management overhead and page faults during high-complexity verification,
> especially in multi-core parallel scenarios.
>
> This patch introduces a dedicated 'bpf_verifier_state_list' slab cache to
> optimize these allocations, providing better speed, reduced fragmentation,
> and improved cache locality. All allocation and deallocation paths are
> migrated to use kmem_cache_zalloc() and kmem_cache_free().
>
> Performance evaluation using a stress test (1000 conditional branches)
> executed in parallel on 32 CPU cores for 60 seconds shows significant
> improvements:
This patch is a little mess. First, don't send a new version by replying to
your previous version.
>
> Metric | Baseline | Patched | Delta (%)
> --------------------|---------------|---------------|----------
> Page Faults | 12,377,064 | 8,534,044 | -31.05%
> IPC | 1.17 | 1.22 | +4.27%
> CPU Cycles | 1,795.37B | 1,700.33B | -5.29%
> Instructions | 2,102.99B | 2,074.27B | -1.37%
And the test case is odd too. What performance improvement do we
get from this testing result? You run the veristat infinitely and record the
performance with perf for 60s, so what can we get? Shouldn't you
run the veristat for certain times and see the performance, such as
the duration or the CPU cycles?
You optimize the verifier to reduce the verifying duration in your case,
which seems to be a complex BPF program and consume much time
in verifier. So what performance increasing do you get in your case?
>
> Detailed Benchmark Report:
> ==========================
> 1. Test Case Compilation (verifier_state_stress.c):
> clang -O2 -target bpf -D__TARGET_ARCH_x86 -I. -I./tools/include \
> -I./tools/lib/bpf -I./tools/testing/selftests/bpf -c \
> verifier_state_stress.c -o verifier_state_stress.bpf.o
>
[...]
>
> 60.036630614 seconds time elapsed
>
> Suggested-by: Kumar Kartikeya Dwivedi <memxor@...il.com>
> Suggested-by: Eduard Zingerman <eddyz87@...il.com>
You don't need to add all the reviewers here, unless big changes is
made.
> Signed-off-by: Qiliang Yuan <realwujing@...il.com>
> ---
> On Mon, 2026-01-12 at 19:15 +0100, Kumar Kartikeya Dwivedi wrote:
> > Did you run any numbers on whether this improves verification performance?
> > Without any compelling evidence, I would leave things as-is.
This is not how we write change logs, please see how other people
do.
>
> This version addresses the feedback by providing detailed 'perf stat'
> benchmarks and reproducible stress test code to demonstrate the
> compelling performance gains.
>
Powered by blists - more mailing lists