lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <75807149f7de7a106db0ccda88e5d4439b94a1e7.camel@gmail.com>
Date: Thu, 15 Jan 2026 22:51:19 -0800
From: Eduard Zingerman <eddyz87@...il.com>
To: Qiliang Yuan <realwujing@...il.com>
Cc: andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
 daniel@...earbox.net, 	haoluo@...gle.com, jolsa@...nel.org,
 kpsingh@...nel.org, 	linux-kernel@...r.kernel.org, martin.lau@...ux.dev,
 sdf@...ichev.me, 	song@...nel.org, yonghong.song@...ux.dev,
 yuanql9@...natelecom.cn
Subject: Re: [PATCH] bpf/verifier: compress bpf_reg_state by using bitfields

On Fri, 2026-01-16 at 14:07 +0800, Qiliang Yuan wrote:
> Hi Eduard,
> 
> On Thu, Jan 15, 2026, Eduard Zingerman wrote:
> > varistat collects verifier memory usage statistics.
> > Does this change has an impact on programs generated for
> > e.g. selftests and sched_ext?
> > 
> > In general, you posted 4 patches claiming performance improvements,
> > but non of them are supported by any measurements.
> > 
> > P.S.
> > Is this LLM-generated?
> 
> Thank you for the feedback. I would like to clarify that these optimizations
> are the result of a deliberate engineering effort to address specific
> performance bottlenecks in the BPF verifier. These improvements were identified
> through my personal code analysis over the past two months, though I have only
> recently started submitting them to the community.
> 
> Regarding the impact on selftests and sched_ext: I have verified these changes
> using 'veristat' against the BPF selftests. Since these optimizations target
> the core verifier engine and structural layout, they benefit any complex BPF
> program, including those in sched_ext. The results show a clear reduction in
> verification duration (up to 56%) and peak memory usage (due to the reduction of
> struct bpf_reg_state from 112 to 104 bytes), with zero changes in the total
> instruction or state counts. This confirms that the verification logic remains
> identical while resource efficiency is significantly improved.
> 
> The specific order and context of the four patches are as follows:
> 
> 1. bpf/verifier: implement slab cache for verifier state list
>    (https://lore.kernel.org/all/tencent_0074C23A28B59EA264C502FA3C9EF6622A0A@qq.com/)
>    Focuses on reducing allocation overhead. Detailed benchmark results added in:
>    (https://lore.kernel.org/all/tencent_9C541313B9B3C381AB950BC531F6C627ED05@qq.com/)

>From that report:

>  arena_strsearch                 121 us               64 us               -47.11%
>  bpf_loop:stack_check            747 us               469 us              -37.22%
>  bpf_loop:test_prog              519 us               386 us              -25.63%
>  bpf_loop:prog_null_ctx          202 us               162 us              -19.80%

Instructions processed from verifier from the report:
- arena_strsearch:        20
- bpf_loop:stack_check:   64
- bpf_loop:test_prog:     325
- bpf_loop:prog_null_ctx: 22

With such sa mall number of instructions processed the results are
nothing more than a random fluke. To get more or less reasonable
impact measurements, please use 'perf' tool and use programs where
verifier needs to process tens or hundreds of thousands instructions.

> 2. bpf/verifier: compress bpf_reg_state by using bitfields
>    (https://lore.kernel.org/all/20260115144946.439069-1-realwujing@gmail.com/)
>    This is a structural memory optimization. By packing 'frameno', 'subreg_def',
>    and 'precise' into bitfields, we eliminated 7 bytes of padding, reducing
>    the struct size from 112 to 104 bytes. This is a deterministic memory
>    saving based on object layout, which is particularly effective for
>    large-scale verification states.

For this optimization veristat is a reasonable tool to use, it can
track a 'mem_peak' value.

> 3. bpf/verifier: optimize ID mapping reset in states_equal
>    (https://lore.kernel.org/all/20260115150405.443581-1-realwujing@gmail.com/)
>    This is an algorithmic optimization similar to memoization. By tracking the
>    high-water mark of used IDs, it avoids a full 4.7KB memset in every
>    states_equal() call. This reduces the complexity of resetting the ID map
>    from O(MAX_SIZE) to O(ACTUAL_USED), which significantly speeds up state
>    pruning during complex verification.

As I said before, this is a useful change.

> 4. bpf/verifier: optimize precision backtracking by skipping precise bits
>    (https://lore.kernel.org/all/20260115152037.449362-1-realwujing@gmail.com/)
>    Following your suggestion to refactor the logic into the core engine for
>    better coverage and clarity, I have provided a v2 version of this patch here:
>    (https://lore.kernel.org/all/20260116045839.23743-1-realwujing@gmail.com/)
>    This v2 version specifically addresses your feedback by centralizing the
>    logic and includes a comprehensive performance comparison (veristat results)
>    in the commit log. It reduces the complexity of redundant backtracking
>    requests from O(D) (where D is history depth) to O(1) by utilizing the
>    'precise' flag to skip already-processed states.

Same as with #1: using veristat duration metric, especially for such
small programs, is not a reasonable performance analysis.

> Best regards,
> 
> Qiliang Yuan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ