[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <75807149f7de7a106db0ccda88e5d4439b94a1e7.camel@gmail.com>
Date: Thu, 15 Jan 2026 22:51:19 -0800
From: Eduard Zingerman <eddyz87@...il.com>
To: Qiliang Yuan <realwujing@...il.com>
Cc: andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
daniel@...earbox.net, haoluo@...gle.com, jolsa@...nel.org,
kpsingh@...nel.org, linux-kernel@...r.kernel.org, martin.lau@...ux.dev,
sdf@...ichev.me, song@...nel.org, yonghong.song@...ux.dev,
yuanql9@...natelecom.cn
Subject: Re: [PATCH] bpf/verifier: compress bpf_reg_state by using bitfields
On Fri, 2026-01-16 at 14:07 +0800, Qiliang Yuan wrote:
> Hi Eduard,
>
> On Thu, Jan 15, 2026, Eduard Zingerman wrote:
> > varistat collects verifier memory usage statistics.
> > Does this change has an impact on programs generated for
> > e.g. selftests and sched_ext?
> >
> > In general, you posted 4 patches claiming performance improvements,
> > but non of them are supported by any measurements.
> >
> > P.S.
> > Is this LLM-generated?
>
> Thank you for the feedback. I would like to clarify that these optimizations
> are the result of a deliberate engineering effort to address specific
> performance bottlenecks in the BPF verifier. These improvements were identified
> through my personal code analysis over the past two months, though I have only
> recently started submitting them to the community.
>
> Regarding the impact on selftests and sched_ext: I have verified these changes
> using 'veristat' against the BPF selftests. Since these optimizations target
> the core verifier engine and structural layout, they benefit any complex BPF
> program, including those in sched_ext. The results show a clear reduction in
> verification duration (up to 56%) and peak memory usage (due to the reduction of
> struct bpf_reg_state from 112 to 104 bytes), with zero changes in the total
> instruction or state counts. This confirms that the verification logic remains
> identical while resource efficiency is significantly improved.
>
> The specific order and context of the four patches are as follows:
>
> 1. bpf/verifier: implement slab cache for verifier state list
> (https://lore.kernel.org/all/tencent_0074C23A28B59EA264C502FA3C9EF6622A0A@qq.com/)
> Focuses on reducing allocation overhead. Detailed benchmark results added in:
> (https://lore.kernel.org/all/tencent_9C541313B9B3C381AB950BC531F6C627ED05@qq.com/)
>From that report:
> arena_strsearch 121 us 64 us -47.11%
> bpf_loop:stack_check 747 us 469 us -37.22%
> bpf_loop:test_prog 519 us 386 us -25.63%
> bpf_loop:prog_null_ctx 202 us 162 us -19.80%
Instructions processed from verifier from the report:
- arena_strsearch: 20
- bpf_loop:stack_check: 64
- bpf_loop:test_prog: 325
- bpf_loop:prog_null_ctx: 22
With such sa mall number of instructions processed the results are
nothing more than a random fluke. To get more or less reasonable
impact measurements, please use 'perf' tool and use programs where
verifier needs to process tens or hundreds of thousands instructions.
> 2. bpf/verifier: compress bpf_reg_state by using bitfields
> (https://lore.kernel.org/all/20260115144946.439069-1-realwujing@gmail.com/)
> This is a structural memory optimization. By packing 'frameno', 'subreg_def',
> and 'precise' into bitfields, we eliminated 7 bytes of padding, reducing
> the struct size from 112 to 104 bytes. This is a deterministic memory
> saving based on object layout, which is particularly effective for
> large-scale verification states.
For this optimization veristat is a reasonable tool to use, it can
track a 'mem_peak' value.
> 3. bpf/verifier: optimize ID mapping reset in states_equal
> (https://lore.kernel.org/all/20260115150405.443581-1-realwujing@gmail.com/)
> This is an algorithmic optimization similar to memoization. By tracking the
> high-water mark of used IDs, it avoids a full 4.7KB memset in every
> states_equal() call. This reduces the complexity of resetting the ID map
> from O(MAX_SIZE) to O(ACTUAL_USED), which significantly speeds up state
> pruning during complex verification.
As I said before, this is a useful change.
> 4. bpf/verifier: optimize precision backtracking by skipping precise bits
> (https://lore.kernel.org/all/20260115152037.449362-1-realwujing@gmail.com/)
> Following your suggestion to refactor the logic into the core engine for
> better coverage and clarity, I have provided a v2 version of this patch here:
> (https://lore.kernel.org/all/20260116045839.23743-1-realwujing@gmail.com/)
> This v2 version specifically addresses your feedback by centralizing the
> logic and includes a comprehensive performance comparison (veristat results)
> in the commit log. It reduces the complexity of redundant backtracking
> requests from O(D) (where D is history depth) to O(1) by utilizing the
> 'precise' flag to skip already-processed states.
Same as with #1: using veristat duration metric, especially for such
small programs, is not a reasonable performance analysis.
> Best regards,
>
> Qiliang Yuan
Powered by blists - more mailing lists