[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a714ee96d0ad96bbc9d51037616e5c4e2790a8ec.camel@gmail.com>
Date: Mon, 19 Jan 2026 10:43:37 -0800
From: Eduard Zingerman <eddyz87@...il.com>
To: Qiliang Yuan <realwujing@...il.com>
Cc: andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
daniel@...earbox.net, haoluo@...gle.com, jolsa@...nel.org,
kpsingh@...nel.org, linux-kernel@...r.kernel.org, martin.lau@...ux.dev,
sdf@...ichev.me, song@...nel.org, yonghong.song@...ux.dev,
yuanql9@...natelecom.cn
Subject: Re: [PATCH v3] bpf/verifier: optimize precision backtracking by
skipping precise bits
On Sat, 2026-01-17 at 18:09 +0800, Qiliang Yuan wrote:
Hi Qiliang,
> 2. System-wide saturation profiling (32 cores):
> # Start perf in background
> sudo perf stat -a -- sleep 60 &
> # Start 32 parallel loops of veristat
> for i in {1..32}; do (while true; do ./veristat backtrack_stress.bpf.o > /dev/null; done &); done
I'm not sure system-wide testing is helpful in this context.
I'd suggest collecting stats for a single process, e.g. as follows:
perf stat -B --all-kernel -r10 -- ./veristat -q pyperf180.bpf.o
(Note: pyperf180 is a reasonably complex test for many purposes).
And then collecting profiling data:
perf record -o <somewhere-where-mmap-is-possible> \
--all-kernel --call-graph=dwarf --vmlinux=<path-to-vmlinux> \
-- ./veristat -q pyperf180.bpf.o
And then inspecting the profiling data using `perf report`.
What I see in stats corroborates with Yonghong's findings:
W/o the patch:
...
22293282 branch-misses # 2.8 % branch_miss_rate ( +- 1.25% ) (50.10%)
594485451 branches # 1012.5 M/sec branch_frequency ( +- 1.68% ) (66.67%)
1544503960 cpu-cycles # 2.6 GHz cycles_frequency ( +- 0.18% ) (67.02%)
3305212994 instructions # 2.1 instructions insn_per_cycle ( +- 2.04% ) (67.11%)
587496908 stalled-cycles-frontend # 0.38 frontend_cycles_idle ( +- 1.21% ) (66.39%)
0.60033 +- 0.00173 seconds time elapsed ( +- 0.29% )
With the patch
...
22397789 branch-misses # 2.8 % branch_miss_rate ( +- 1.27% ) (50.37%)
596289399 branches # 1004.8 M/sec branch_frequency ( +- 1.59% ) (66.95%)
1546060617 cpu-cycles # 2.6 GHz cycles_frequency ( +- 0.16% ) (66.67%)
3325745352 instructions # 2.2 instructions insn_per_cycle ( +- 1.76% ) (66.61%)
588040713 stalled-cycles-frontend # 0.38 frontend_cycles_idle ( +- 1.23% ) (66.48%)
0.60697 +- 0.00201 seconds time elapsed ( +- 0.33% )
So, I'd suggest shelving this change for now.
If you take a look at the profiling data, you'd notice that low
hanging fruit is actually improving bpf_patch_insn_data(),
It takes ~40% of time, at-least for this program.
This was actually discussed a very long time ago [1].
If you are interested in speeding up verifier,
maybe consider taking a look?
Best regards,
Eduard Zingerman.
[1] https://lore.kernel.org/bpf/CAEf4BzY_E8MSL4mD0UPuuiDcbJhh9e2xQo2=5w+ppRWWiYSGvQ@mail.gmail.com/
Powered by blists - more mailing lists