lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg>
Date: Thu, 15 May 2025 21:12:25 +0800
From: Shung-Hsi Yu <shung-hsi.yu@...e.com>
To: bpf@...r.kernel.org, linux-mm@...ck.org, Kees Cook <kees@...nel.org>, 
	Andrii Nakryiko <andrii@...nel.org>, Ihor Solodrai <ihor.solodrai@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Michal Hocko <mhocko@...e.com>, Vlastimil Babka <vbabka@...e.cz>, 
	Uladzislau Rezki <urezki@...il.com>, linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org, 
	regressions@...ts.linux.dev, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>, Eduard Zingerman <eddyz87@...il.com>
Subject: [REGRESSION] bpf verifier slowdown due to vrealloc() change since
 6.15-rc6

Hi,

There is an observable slowdown when running BPF selftests on 6.15-rc6
kernel[1] built with tools/testing/selftests/bpf/{config,config.x86_64}.
Overall the BPF selftests now takes 2x time to run (from ~25m to ~50m),
and for the verif_scale_loop3_fail it went from single digit seconds to
6 minutes.

Bisect was done by Pawan and got to commit a0309faf1cb0 "mm: vmalloc:
support more granular vrealloc() sizing"[2]. To further zoom in the
issue, I tried removing the only kvrealloc() call in kernel/bpf/ by
reverting commit 96a30e469ca1 "bpf: use common instruction history
across all states", so _krealloc()_ was used instead of kvrealloc(), and
observe that there is _no_ slowdown[3]. While the bisect and the revert
is done on 6.14.7-rc2, I think it should stll be pretty representitive.

In short, the follow were tested:
- 6.15-rc6 (has a0309faf1cb0) -> slowdown
- 6.14.7-rc2 (has a0309faf1cb0) -> slowdown
- 6.14.7-rc2 (has a0309faf1cb0, call to kvrealloc in
  kernel/bpf/verifier.c replaced with krealloc) -> _no_ slowdown

And the vrealloc() change is causing slowdown in kvrealloc() call within
push_insn_history().

  /* for any branch, call, exit record the history of jmps in the given state */
  static int push_insn_history(struct bpf_verifier_env *env, struct bpf_verifier_state *cur,
  			     int insn_flags, u64 linked_regs)
  {
  	struct bpf_insn_hist_entry *p;
  	size_t alloc_size;
  	...
  	if (cur->insn_hist_end + 1 > env->insn_hist_cap) {
  		alloc_size = size_mul(cur->insn_hist_end + 1, sizeof(*p));
  		p = kvrealloc(env->insn_hist, alloc_size, GFP_USER);
  		if (!p)
  			return -ENOMEM;
  		env->insn_hist = p;
  		env->insn_hist_cap = alloc_size / sizeof(*p);
  	}
  
  	p = &env->insn_hist[cur->insn_hist_end];
  	p->idx = env->insn_idx;
  	p->prev_idx = env->prev_insn_idx;
  	p->flags = insn_flags;
  	p->linked_regs = linked_regs;
  
  	cur->insn_hist_end++;
  	env->cur_hist_ent = p;
  
  	return 0;
  }

BPF CI probably hasn't hit this yet because bpf-next have only got to
6.15-rc4.

Shung-Hsi

#regzbot introduced: a0309faf1cb0622cac7c820150b7abf2024acff5

1: https://github.com/shunghsiyu/libbpf/actions/runs/15038992168/job/42266125686
2: https://lore.kernel.org/stable/20250515041659.smhllyarxdwp7cav@desk/
3: https://github.com/shunghsiyu/libbpf/actions/runs/15043433548/job/42280277024

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ