lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQKkDCNB5xk-gnUWXJ44LqG0gRaHfE5WjbAwZL-vnV+6oA@mail.gmail.com>
Date: Mon, 19 Jan 2026 18:09:37 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Qiliang Yuan <realwujing@...il.com>
Cc: Eduard <eddyz87@...il.com>, Andrii Nakryiko <andrii.nakryiko@...il.com>, 
	Andrii Nakryiko <andrii@...nel.org>, Alexei Starovoitov <ast@...nel.org>, bpf <bpf@...r.kernel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	KP Singh <kpsingh@...nel.org>, LKML <linux-kernel@...r.kernel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Stanislav Fomichev <sdf@...ichev.me>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, yuanql9@...natelecom.cn
Subject: Re: [PATCH v3] bpf/verifier: optimize ID mapping reset in states_equal

On Mon, Jan 19, 2026 at 5:56 PM Qiliang Yuan <realwujing@...il.com> wrote:
>
> The verifier uses an ID mapping table (struct bpf_idmap) during state
> equivalence checks. Currently, reset_idmap_scratch performs a full memset
> on the entire map (~4.7KB) in every call to states_equal.
>
> This ensures that reset overhead is minimal and the search loop is
> bounded by the number of IDs actually encountered in the current
> equivalence check.
>
> Benchmark results (system-wide 'perf stat' during high-concurrency 'veristat'
> stress test, 60s):
>
> The following results, captured using perf while running veristat in parallel
> across all CPU cores, show a significant reduction in instruction overhead
> (~9.3%) and branch executions (~11%), confirming that the O(1) reset logic
> significantly reduces the verifier's workload during state equivalence
> checks.

You were already told by multiple people to stop doing this pointless
stress runs across all cpus.

> Metric          | Baseline      | Patched       | Delta
> ----------------|---------------|---------------|----------
> Iterations      | 5710          | 5731          | +0.37%
> Instructions    | 1.714 T       | 1.555 T       | -9.28%
> Inst/Iter       | 300.2 M       | 271.3 M       | -9.63%
> Cycles          | 1.436 T       | 1.335 T       | -7.03%
> Branches        | 350.4 B       | 311.9 B       | -10.99%
> Migrations      | 25,977        | 23,524        | -9.44%
>
> Test Command:
>   seq 1 2000000 | sudo perf stat -a -- \
>     timeout 60s xargs -P $(nproc) -I {} ./veristat access_map_in_map.bpf.o

and you were told to stop using tiny programs that don't exercise the
path you're changing in the patch.

> Detailed Performance Stats:
>
> Baseline:
>  Performance counter stats for 'system wide':
>
>          6,735,538      context-switches                 #   3505.5 cs/sec  cs_per_second
>       1,921,431.27 msec cpu-clock                        #     32.0 CPUs  CPUs_utilized
>             25,977      cpu-migrations                   #     13.5 migrations/sec  migrations_per_second
>          7,268,841      page-faults                      #   3783.0 faults/sec  page_fault_per_second
>     18,662,357,052      branch-misses                    #      3.9 %  branch_miss_rate         (50.14%)
>    350,411,558,023      branches                         #    182.4 M/sec  branch_frequency     (66.85%)
>  1,435,774,261,319      cpu-cycles                       #      0.7 GHz  cycles_frequency       (66.95%)
>  1,714,154,229,503      instructions                     #      1.2 instructions  insn_per_cycle  (66.86%)
>    429,445,480,497      stalled-cycles-frontend          #     0.30 frontend_cycles_idle        (66.36%)
>
>       60.035899231 seconds time elapsed
>
> Patched:
>  Performance counter stats for 'system wide':
>
>          6,662,371      context-switches                 #   3467.3 cs/sec  cs_per_second
>       1,921,497.78 msec cpu-clock                        #     32.0 CPUs  CPUs_utilized
>             23,524      cpu-migrations                   #     12.2 migrations/sec  migrations_per_second
>          7,783,064      page-faults                      #   4050.5 faults/sec  page_faults_per_second
>     18,181,655,163      branch-misses                    #      4.3 %  branch_miss_rate         (50.15%)
>    311,865,239,743      branches                         #    162.3 M/sec  branch_frequency     (66.86%)
>  1,334,859,779,821      cpu-cycles                       #      0.7 GHz  cycles_frequency       (66.96%)
>  1,555,086,465,845      instructions                     #      1.2 instructions  insn_per_cycle  (66.87%)
>    407,666,712,045      stalled-cycles-frontend          #     0.31 frontend_cycles_idle        (66.35%)
>
>       60.034702643 seconds time elapsed
>
> Acked-by: Eduard Zingerman <eddyz87@...il.com>
> Signed-off-by: Qiliang Yuan <realwujing@...il.com>
> ---
> v3:
>  - Remove Suggested-by tags per Eduard's feedback.
>  - Add Eduard's Acked-by.
>  - Credit Andrii Nakryiko for the further optimization suggestion.
>  - Mention the limitation of system-wide profiling in commit message.

Do not add "educational" information to the commit log.
The commit should describe why and what is being done and the result.
If you cannot observe the difference before/after, then don't say anything.

"significantly reduces the verifier's workload" is not true.
You were not able to measure it. Don't invent gains.

pw-bot: cr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ