[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260116045839.23743-1-realwujing@gmail.com>
Date: Fri, 16 Jan 2026 12:58:39 +0800
From: Qiliang Yuan <realwujing@...il.com>
To: Eduard Zingerman <eddyz87@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>
Cc: bpf@...r.kernel.org,
linux-kernel@...r.kernel.org,
john.fastabend@...il.com,
martin.lau@...ux.dev,
song@...nel.org,
yonghong.song@...ux.dev,
kpsingh@...nel.org,
sdf@...ichev.me,
haoluo@...gle.com,
jolsa@...nel.org,
Qiliang Yuan <realwujing@...il.com>
Subject: [PATCH v2] bpf/verifier: optimize precision backtracking by skipping precise bits
Currently, precision backtracking logic in __mark_chain_precision does
not check if the target register or stack slot is already marked as
precise. This can lead to redundant work when multiple paths require
the same register to be precise.
This patch moves the "skip if already precise" logic into the core
backtracking loop setup in __mark_chain_precision(). This ensures
that all entry points (mark_chain_precision, propagate_precision)
benefit from the optimization and allows for cleaner code at the
call sites.
Specifically:
- In __mark_chain_precision(), before starting the backtracking loop,
check if the initial register/stack slot is already precise.
- For batch propagation (used by propagate_precision), iterate through
the requested masks and clear bits that are already precise in the
starting state.
- Remove redundant checks in mark_chain_precision() and
propagate_precision().
Performance results:
The optimization significantly reduces verification time by skipping
redundant backtracking for registers already marked as precise.
Key improvements (Duration):
- prog_nested_calls: 671us -> 292us (-56.48%)
- prog_non_constant_callback: 254us -> 111us (-56.30%)
- stack_check: 807us -> 411us (-49.07%)
- arena_strsearch: 120us -> 65us (-45.83%)
- prog_null_ctx: 216us -> 164us (-24.07%)
Verified that total instruction and state counts remain identical
across all tests (0.00% change), confirming logic equivalence.
Test steps and detailed performance raw data are as follows:
The data was collected using the following command:
$ sudo ./veristat -e duration,total_insns,total_states -o csv \
bpf_flow.bpf.o bpf_loop.bpf.o arena_strsearch.bpf.o
Baseline (at b54345928fa1):
file_name,prog_name,verdict,duration,total_insns,total_states
arena_strsearch.bpf.o,arena_strsearch,failure,120,20,2
bpf_flow.bpf.o,_dissect,success,465,211,13
bpf_flow.bpf.o,flow_dissector_0,success,2408,1461,68
bpf_flow.bpf.o,flow_dissector_1,success,2698,1567,59
bpf_flow.bpf.o,flow_dissector_2,success,2010,1244,56
bpf_flow.bpf.o,flow_dissector_3,success,2250,1498,57
bpf_flow.bpf.o,flow_dissector_4,success,343,259,4
bpf_flow.bpf.o,flow_dissector_5,success,674,416,21
bpf_loop.bpf.o,prog_invalid_flags,success,292,50,5
bpf_loop.bpf.o,prog_nested_calls,success,671,145,19
bpf_loop.bpf.o,prog_non_constant_callback,success,254,41,5
bpf_loop.bpf.o,prog_null_ctx,success,216,22,3
bpf_loop.bpf.o,stack_check,success,807,325,25
bpf_loop.bpf.o,test_prog,success,493,64,7
Patched:
file_name,prog_name,verdict,duration,total_insns,total_states
arena_strsearch.bpf.o,arena_strsearch,failure,65,20,2
bpf_flow.bpf.o,_dissect,success,467,211,13
bpf_flow.bpf.o,flow_dissector_0,success,2392,1461,68
bpf_flow.bpf.o,flow_dissector_1,success,2722,1567,59
bpf_flow.bpf.o,flow_dissector_2,success,2050,1244,56
bpf_flow.bpf.o,flow_dissector_3,success,2258,1498,57
bpf_flow.bpf.o,flow_dissector_4,success,342,259,4
bpf_flow.bpf.o,flow_dissector_5,success,702,416,21
bpf_loop.bpf.o,prog_invalid_flags,success,239,50,5
bpf_loop.bpf.o,prog_nested_calls,success,292,145,19
bpf_loop.bpf.o,prog_non_constant_callback,success,111,41,5
bpf_loop.bpf.o,prog_null_ctx,success,164,22,3
bpf_loop.bpf.o,stack_check,success,411,325,25
bpf_loop.bpf.o,test_prog,success,484,64,7
Comparison (veristat -C):
$ ./veristat -C prec_skip_baseline.csv prec_skip_patched.csv
File Program Verdict (A) Verdict (B) Verdict (DIFF) Duration (us) (A) Duration (us) (B) Duration (us) (DIFF) Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
--------------------- -------------------------- ----------- ----------- -------------- ----------------- ----------------- -------------------- --------- --------- ------------ ---------- ---------- -------------
arena_strsearch.bpf.o arena_strsearch failure failure MATCH 120 65 -55 (-45.83%) 20 20 +0 (+0.00%) 2 2 +0 (+0.00%)
bpf_flow.bpf.o _dissect success success MATCH 465 467 +2 (+0.43%) 211 211 +0 (+0.00%) 13 13 +0 (+0.00%)
bpf_flow.bpf.o flow_dissector_0 success success MATCH 2408 2392 -16 (-0.66%) 1461 1461 +0 (+0.00%) 68 68 +0 (+0.00%)
bpf_flow.bpf.o flow_dissector_1 success success MATCH 2698 2722 +24 (+0.89%) 1567 1567 +0 (+0.00%) 59 59 +0 (+0.00%)
bpf_flow.bpf.o flow_dissector_2 success success MATCH 2010 2050 +40 (+1.99%) 1244 1244 +0 (+0.00%) 56 56 +0 (+0.00%)
bpf_flow.bpf.o flow_dissector_3 success success MATCH 2250 2258 +8 (+0.36%) 1498 1498 +0 (+0.00%) 57 57 +0 (+0.00%)
bpf_flow.bpf.o flow_dissector_4 success success MATCH 343 342 -1 (-0.29%) 259 259 +0 (+0.00%) 4 4 +0 (+0.00%)
bpf_flow.bpf.o flow_dissector_5 success success MATCH 674 702 +28 (+4.15%) 416 416 +0 (+0.00%) 21 21 +0 (+0.00%)
bpf_loop.bpf.o prog_invalid_flags success success MATCH 292 239 -53 (-18.15%) 50 50 +0 (+0.00%) 5 5 +0 (+0.00%)
bpf_loop.bpf.o prog_nested_calls success success MATCH 671 292 -379 (-56.48%) 145 145 +0 (+0.00%) 19 19 +0 (+0.00%)
bpf_loop.bpf.o prog_non_constant_callback success success MATCH 254 111 -143 (-56.30%) 41 41 +0 (+0.00%) 5 5 +0 (+0.00%)
bpf_loop.bpf.o prog_null_ctx success success MATCH 216 164 -52 (-24.07%) 22 22 +0 (+0.00%) 3 3 +0 (+0.00%)
bpf_loop.bpf.o stack_check success success MATCH 807 411 -396 (-49.07%) 325 325 +0 (+0.00%) 25 25 +0 (+0.00%)
bpf_loop.bpf.o test_prog success success MATCH 493 484 -9 (-1.83%) 64 64 +0 (+0.00%) 7 7 +0 (+0.00%)
Suggested-by: Eduard Zingerman <eddyz87@...il.com>
Signed-off-by: Qiliang Yuan <realwujing@...il.com>
---
v1 -> v2:
- Move "skip if already precise" logic into the core backtracking engine
__mark_chain_precision() as suggested by Eduard Zingerman.
- Add detailed performance benchmark results and veristat comparison.
- Clean up call sites in mark_chain_precision() and propagate_precision().
kernel/bpf/verifier.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f0ca69f888fa..340d972bd833 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4765,14 +4765,37 @@ static int __mark_chain_precision(struct bpf_verifier_env *env,
* slot, but don't set precise flag in current state, as precision
* tracking in the current state is unnecessary.
*/
- func = st->frame[bt->frame];
if (regno >= 0) {
- reg = &func->regs[regno];
+ reg = &st->frame[bt->frame]->regs[regno];
if (reg->type != SCALAR_VALUE) {
verifier_bug(env, "backtracking misuse");
return -EFAULT;
}
+ if (reg->precise)
+ return 0;
bt_set_reg(bt, regno);
+ } else {
+ for (fr = bt->frame; fr >= 0; fr--) {
+ u32 reg_mask = bt_frame_reg_mask(bt, fr);
+ u64 stack_mask = bt_frame_stack_mask(bt, fr);
+ DECLARE_BITMAP(mask, 64);
+
+ func = st->frame[fr];
+ if (reg_mask) {
+ bitmap_from_u64(mask, reg_mask);
+ for_each_set_bit(i, mask, 32) {
+ if (func->regs[i].precise)
+ bt_clear_frame_reg(bt, fr, i);
+ }
+ }
+ if (stack_mask) {
+ bitmap_from_u64(mask, stack_mask);
+ for_each_set_bit(i, mask, 64) {
+ if (func->stack[i].spilled_ptr.precise)
+ bt_clear_frame_slot(bt, fr, i);
+ }
+ }
+ }
}
if (bt_empty(bt))
--
2.39.5
Powered by blists - more mailing lists