lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 14 Aug 2023 21:41:45 +0800
From: Leon Hwang <hffilwlqm@...il.com>
To: bpf@...r.kernel.org
Cc: ast@...nel.org,
	daniel@...earbox.net,
	andrii@...nel.org,
	martin.lau@...ux.dev,
	song@...nel.org,
	yonghong.song@...ux.dev,
	john.fastabend@...il.com,
	kpsingh@...nel.org,
	sdf@...gle.com,
	haoluo@...gle.com,
	jolsa@...nel.org,
	x86@...nel.org,
	tglx@...utronix.de,
	mingo@...hat.com,
	bp@...en8.de,
	dave.hansen@...ux.intel.com,
	hpa@...or.com,
	mykolal@...com,
	shuah@...nel.org,
	davem@...emloft.net,
	dsahern@...nel.org,
	hffilwlqm@...il.com,
	tangyeechou@...il.com,
	kernel-patches-bot@...com,
	maciej.fijalkowski@...el.com,
	netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-kselftest@...r.kernel.org
Subject: [RFC PATCH bpf-next 0/2] bpf, x64: Fix tailcall infinite loop bug

>From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

>From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

>From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls itself.

As a result, a tailcall infinite loop comes up. And the loop would halt
the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack
and RAX register between BPF subprograms. So do it in FENTRY/FEXIT
trampolines.

How did I discover the bug?

>From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256 for
subprogs with tailcalls"), the total stack size limits to around 8KiB.
Then, I write some bpf progs to validate the stack consuming, that are
tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1].

At that time, accidently, I made a tailcall loop. And then the loop halted
my VM. Without the loop, the bpf progs would consume over 8KiB stack size.
But the _stack-overflow_ did not halt my VM.

With bpf_printk(), I confirmed that the tailcall count limit did not work
expectedly. Next, read the code and fix it.

Finally, unfortunately, I only fix it on x64 but other arches. As a
result, CI tests failed because this bug hasn't been fixed on s390x.

Some helps are requested.

[1]: https://github.com/Asphaltt/learn-by-example/tree/main/ebpf/tailcall-stackoverflow

Leon Hwang (2):
  bpf, x64: Fix tailcall infinite loop bug
  selftests/bpf: Add testcases for tailcall infinite loop bug fixing

 arch/x86/net/bpf_jit_comp.c                   |  23 ++-
 include/linux/bpf.h                           |   6 +
 kernel/bpf/trampoline.c                       |   5 +-
 kernel/bpf/verifier.c                         |   9 +-
 .../selftests/bpf/prog_tests/tailcalls.c      | 194 +++++++++++++++++-
 .../bpf/progs/tailcall_bpf2bpf_fentry.c       |  18 ++
 .../bpf/progs/tailcall_bpf2bpf_fexit.c        |  18 ++
 7 files changed, 264 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c


base-commit: 9930e4af4b509bcf6f060b09b16884f26102d110
-- 
2.41.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ