lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260120015616.69224-1-realwujing@gmail.com>
Date: Tue, 20 Jan 2026 09:56:16 +0800
From: Qiliang Yuan <realwujing@...il.com>
To: eddyz87@...il.com
Cc: andrii.nakryiko@...il.com,
	andrii@...nel.org,
	ast@...nel.org,
	bpf@...r.kernel.org,
	daniel@...earbox.net,
	haoluo@...gle.com,
	jolsa@...nel.org,
	kpsingh@...nel.org,
	linux-kernel@...r.kernel.org,
	martin.lau@...ux.dev,
	realwujing@...il.com,
	sdf@...ichev.me,
	song@...nel.org,
	yonghong.song@...ux.dev,
	yuanql9@...natelecom.cn
Subject: [PATCH v3] bpf/verifier: optimize ID mapping reset in states_equal

The verifier uses an ID mapping table (struct bpf_idmap) during state
equivalence checks. Currently, reset_idmap_scratch performs a full memset
on the entire map (~4.7KB) in every call to states_equal.

This ensures that reset overhead is minimal and the search loop is
bounded by the number of IDs actually encountered in the current
equivalence check.

Benchmark results (system-wide 'perf stat' during high-concurrency 'veristat'
stress test, 60s):

The following results, captured using perf while running veristat in parallel
across all CPU cores, show a significant reduction in instruction overhead
(~9.3%) and branch executions (~11%), confirming that the O(1) reset logic
significantly reduces the verifier's workload during state equivalence
checks.

Metric          | Baseline      | Patched       | Delta
----------------|---------------|---------------|----------
Iterations      | 5710          | 5731          | +0.37%
Instructions    | 1.714 T       | 1.555 T       | -9.28%
Inst/Iter       | 300.2 M       | 271.3 M       | -9.63%
Cycles          | 1.436 T       | 1.335 T       | -7.03%
Branches        | 350.4 B       | 311.9 B       | -10.99%
Migrations      | 25,977        | 23,524        | -9.44%

Test Command:
  seq 1 2000000 | sudo perf stat -a -- \
    timeout 60s xargs -P $(nproc) -I {} ./veristat access_map_in_map.bpf.o

Detailed Performance Stats:

Baseline:
 Performance counter stats for 'system wide':

         6,735,538      context-switches                 #   3505.5 cs/sec  cs_per_second
      1,921,431.27 msec cpu-clock                        #     32.0 CPUs  CPUs_utilized
            25,977      cpu-migrations                   #     13.5 migrations/sec  migrations_per_second
         7,268,841      page-faults                      #   3783.0 faults/sec  page_fault_per_second
    18,662,357,052      branch-misses                    #      3.9 %  branch_miss_rate         (50.14%)
   350,411,558,023      branches                         #    182.4 M/sec  branch_frequency     (66.85%)
 1,435,774,261,319      cpu-cycles                       #      0.7 GHz  cycles_frequency       (66.95%)
 1,714,154,229,503      instructions                     #      1.2 instructions  insn_per_cycle  (66.86%)
   429,445,480,497      stalled-cycles-frontend          #     0.30 frontend_cycles_idle        (66.36%)

      60.035899231 seconds time elapsed

Patched:
 Performance counter stats for 'system wide':

         6,662,371      context-switches                 #   3467.3 cs/sec  cs_per_second
      1,921,497.78 msec cpu-clock                        #     32.0 CPUs  CPUs_utilized
            23,524      cpu-migrations                   #     12.2 migrations/sec  migrations_per_second
         7,783,064      page-faults                      #   4050.5 faults/sec  page_faults_per_second
    18,181,655,163      branch-misses                    #      4.3 %  branch_miss_rate         (50.15%)
   311,865,239,743      branches                         #    162.3 M/sec  branch_frequency     (66.86%)
 1,334,859,779,821      cpu-cycles                       #      0.7 GHz  cycles_frequency       (66.96%)
 1,555,086,465,845      instructions                     #      1.2 instructions  insn_per_cycle  (66.87%)
   407,666,712,045      stalled-cycles-frontend          #     0.31 frontend_cycles_idle        (66.35%)

      60.034702643 seconds time elapsed

Acked-by: Eduard Zingerman <eddyz87@...il.com>
Signed-off-by: Qiliang Yuan <realwujing@...il.com>
---
v3:
 - Remove Suggested-by tags per Eduard's feedback.
 - Add Eduard's Acked-by.
 - Credit Andrii Nakryiko for the further optimization suggestion.
 - Mention the limitation of system-wide profiling in commit message.
v2:
 - Further optimize ID mapping reset (suggested by Andrii Nakryiko) by
   using a simple counter reset and bounding the search loop.
v1:
 - Initial version using a watermark-based partial memset to optimize the
   ID mapping reset overhead.
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/verifier.c        | 23 ++++++++++++++---------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 130bcbd66f60..8355b585cd18 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -692,6 +692,7 @@ struct bpf_id_pair {
 
 struct bpf_idmap {
 	u32 tmp_id_gen;
+	u32 cnt;
 	struct bpf_id_pair map[BPF_ID_MAP_SIZE];
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3135643d5695..6ec6d70e5ce7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18948,18 +18948,21 @@ static bool check_ids(u32 old_id, u32 cur_id, struct bpf_idmap *idmap)
 	if (old_id == 0) /* cur_id == 0 as well */
 		return true;
 
-	for (i = 0; i < BPF_ID_MAP_SIZE; i++) {
-		if (!map[i].old) {
-			/* Reached an empty slot; haven't seen this id before */
-			map[i].old = old_id;
-			map[i].cur = cur_id;
-			return true;
-		}
+	for (i = 0; i < idmap->cnt; i++) {
 		if (map[i].old == old_id)
 			return map[i].cur == cur_id;
 		if (map[i].cur == cur_id)
 			return false;
 	}
+
+	/* Reached the end of known mappings; haven't seen this id before */
+	if (idmap->cnt < BPF_ID_MAP_SIZE) {
+		map[idmap->cnt].old = old_id;
+		map[idmap->cnt].cur = cur_id;
+		idmap->cnt++;
+		return true;
+	}
+
 	/* We ran out of idmap slots, which should be impossible */
 	WARN_ON_ONCE(1);
 	return false;
@@ -19470,8 +19473,10 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
 
 static void reset_idmap_scratch(struct bpf_verifier_env *env)
 {
-	env->idmap_scratch.tmp_id_gen = env->id_gen;
-	memset(&env->idmap_scratch.map, 0, sizeof(env->idmap_scratch.map));
+	struct bpf_idmap *idmap = &env->idmap_scratch;
+
+	idmap->tmp_id_gen = env->id_gen;
+	idmap->cnt = 0;
 }
 
 static bool states_equal(struct bpf_verifier_env *env,
-- 
2.39.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ