[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6205668.MhkbZ0Pkbq@7950hx>
Date: Sat, 17 Jan 2026 19:27:53 +0800
From: Menglong Dong <menglong.dong@...ux.dev>
To: Qiliang Yuan <realwujing@...il.com>
Cc: alexei.starovoitov@...il.com, andrii@...nel.org, ast@...nel.org,
bpf@...r.kernel.org, daniel@...earbox.net, eddyz87@...il.com,
haoluo@...gle.com, john.fastabend@...il.com, jolsa@...nel.org,
kpsingh@...nel.org, linux-kernel@...r.kernel.org, martin.lau@...ux.dev,
memxor@...il.com, realwujing@...il.com, realwujing@...com, sdf@...ichev.me,
song@...nel.org, yonghong.song@...ux.dev, yuanql9@...natelecom.cn
Subject:
Re: [PATCH v2] bpf/verifier: implement slab cache for verifier state list
On 2026/1/17 11:26, Qiliang Yuan wrote:
> On Fri, 16 Jan 2026 22:50:36 +0800, Menglong Dong <menglong.dong@...ux.dev> wrote:
> > On 2026/1/16 21:29, Qiliang Yuan wrote:
> > > The BPF verifier's state exploration logic in is_state_visited() frequently
> > > allocates and deallocates 'struct bpf_verifier_state_list' nodes. Currently,
> > > these allocations use generic kzalloc(), which leads to significant memory
> > > management overhead and page faults during high-complexity verification,
> > > especially in multi-core parallel scenarios.
> > >
> > > This patch introduces a dedicated 'bpf_verifier_state_list' slab cache to
> > > optimize these allocations, providing better speed, reduced fragmentation,
> > > and improved cache locality. All allocation and deallocation paths are
> > > migrated to use kmem_cache_zalloc() and kmem_cache_free().
> > >
> > > Performance evaluation using a stress test (1000 conditional branches)
> > > executed in parallel on 32 CPU cores for 60 seconds shows significant
> > > improvements:
> >
> > This patch is a little mess. First, don't send a new version by replying to
> > your previous version.
>
> Hi Menglong,
>
> Congratulations on obtaining your @linux.dev email! It is great to see your
> contribution to the community being recognized.
>
> The core logic remains unchanged. Following suggestions from several
> reviewers, I've added the perf benchmark data and sent this as a reply to the
> previous thread to keep the context and review history easier to track.
You can put the link of your previous version to the change log. I
suspect the patchwork can't even detect this new version if you send
it as a reply.
>
> > > Metric | Baseline | Patched | Delta (%)
> > > --------------------|---------------|---------------|----------
> > > Page Faults | 12,377,064 | 8,534,044 | -31.05%
> > > IPC | 1.17 | 1.22 | +4.27%
> > > CPU Cycles | 1,795.37B | 1,700.33B | -5.29%
> > > Instructions | 2,102.99B | 2,074.27B | -1.37%
> >
> > And the test case is odd too. What performance improvement do we
> > get from this testing result? You run the veristat infinitely and record the
> > performance with perf for 60s, so what can we get? Shouldn't you
> > run the veristat for certain times and see the performance, such as
> > the duration or the CPU cycles?
>
> Following suggestions from several reviewers, I aimed to provide perf
> benchmark data for comparison. However, existing veristat tests do not
> frequently trigger the specific state list allocation paths I modified. This
> is why I constructed a dedicated stress test and included the code in the
> commit message to clearly demonstrate the performance gains.
>
> > You optimize the verifier to reduce the verifying duration in your case,
> > which seems to be a complex BPF program and consume much time
> > in verifier. So what performance increasing do you get in your case?
>
> The performance gains are primarily seen in the 31.05% reduction in Page Faults
> and the 4.27% increase in IPC. These metrics indicate that moving to a
> dedicated slab cache significantly reduces memory management overhead and
> improves instruction throughput. Specifically, the reduction in CPU cycles
> (-5.29%) confirms that the verifier spends less time on internal allocation
> logic, which is crucial for complex BPF programs that involve deep state
> exploration.
You introduce the slab cache to speed up the verifier, so I think we need
a comparison, such as how long a complex BPF program can take in
the verifier. If it is no more than 1ms, then I think it doesn't make much
sense to obtain the 5% speeding up. After all, it's not a BPF runtime
overhead.
>
> > > Suggested-by: Kumar Kartikeya Dwivedi <memxor@...il.com>
> > > Suggested-by: Eduard Zingerman <eddyz87@...il.com>
> >
> > You don't need to add all the reviewers here, unless big changes is
> > made.
>
> That makes sense, thanks for the advice. I'll refine this in the next version.
>
> > > Signed-off-by: Qiliang Yuan <realwujing@...il.com>
> > > ---
> > > On Mon, 2026-01-12 at 19:15 +0100, Kumar Kartikeya Dwivedi wrote:
> > > > Did you run any numbers on whether this improves verification performance?
> > > > Without any compelling evidence, I would leave things as-is.
> >
> > This is not how we write change logs, please see how other people
> > do.
>
> Actually, the content following the 'Signed-off-by' line and the '---' marker
> is specifically designed to be ignored by 'git am' when the patch is applied.
> Only the text above the triple-dash is preserved as the permanent commit
> message. I intentionally placed the responses to previous reviewer comments
> in that section so that you could see the context and history during review
> without those discussions being permanently recorded in the git log. You can
> verify this behavior by testing 'git am' on a similar patch.
>
> It's for this very reason that I decided to include the reply to reviewers
> directly within the v2 patch.
I think we don't do it this way, and it makes the patch look a mess. You can
reply directly in the mail.
>
Powered by blists - more mailing lists