[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c42dd869d9ba23f14681448581a9c8c7ec23105b.camel@gmail.com>
Date: Tue, 16 Sep 2025 02:14:47 -0700
From: Eduard Zingerman <eddyz87@...il.com>
To: syzbot <syzbot+3afc814e8df1af64b653@...kaller.appspotmail.com>,
andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
daniel@...earbox.net, haoluo@...gle.com, john.fastabend@...il.com,
jolsa@...nel.org, kpsingh@...nel.org, linux-kernel@...r.kernel.org,
martin.lau@...ux.dev, sdf@...ichev.me, song@...nel.org,
syzkaller-bugs@...glegroups.com, yonghong.song@...ux.dev
Subject: Re: [syzbot] [bpf?] WARNING in maybe_exit_scc
On Mon, 2025-09-15 at 16:40 -0700, Eduard Zingerman wrote:
> On Mon, 2025-09-15 at 15:34 -0700, Eduard Zingerman wrote:
>
> [...]
>
> > > verifier bug: scc exit: no visit info for call chain (1)(1)
> > > WARNING: CPU: 1 PID: 6013 at kernel/bpf/verifier.c:1949 maybe_exit_scc+0x768/0x8d0 kernel/bpf/verifier.c:1949
> >
> > Both this and [1] are reported for very similar programs:
> >
> > <this> <[1]>
> > --------------------------------------------------------------------------------------------
> > (b7) r0 = -1023213567 (b7) r0 = -1023213567
> > (bf) r3 = r10 (bf) r3 = r10
> > (07) r3 += -512 (07) r3 += -504
> > (72) *(u8 *)(r10 -16) = -8 (72) *(u8 *)(r10 -16) = -8
> > (71) r4 = *(u8 *)(r10 -16) (71) r4 = *(u8 *)(r10 -16)
> > (65) if r4 s> 0xff000000 goto pc+2 (65) if r4 s> 0xff000000 goto pc+2
> > (2d) if r0 > r4 goto pc+5 (2d) if r0 > r4 goto pc+5
> > (20) r0 = *(u32 *)skb[60673] (20) r0 = *(u32 *)skb[60673]
> > (7b) *(u64 *)(r3 +0) = r0 (7b) *(u64 *)(r3 +0) = r0
> > (1d) if r4 == r4 goto pc+0 (1d) if r4 == r4 goto pc+0
> > (7a) *(u64 *)(r10 -512) = -256 (7a) *(u64 *)(r10 -512) = -256
> > (db) lock *(u64 *)(r3 +0) |= r0 (db) r0 = atomic64_fetch_and((u64 *)(r3 +0), r0)
> > (b5) if r0 <= 0x0 goto pc-2 (b5) if r0 <= 0x0 goto pc-2
> > (95) exit (95) exit
> >
> > So, I assume it's the same issue. Looking into it.
> >
> > [1] https://lore.kernel.org/bpf/68c85b0d.050a0220.2ff435.03a5.GAE@google.com/T/#u
>
> Minimal reproducer:
>
> SEC("socket")
> __caps_unpriv(CAP_BPF)
> __naked void syzbot_bug(void)
> {
> asm volatile (
> "r0 = 100;"
> "1:"
> "*(u64 *)(r10 - 512) = r0;"
> "if r0 <= 0x0 goto 1b;"
> "exit;"
> ::: __clobber_all);
> }
>
> And corresponding verifier log:
>
> Live regs before insn:
> 0: .......... (b7) r0 = 100
> 1 1: 0......... (7b) *(u64 *)(r10 -512) = r0
> 1 2: 0......... (b5) if r0 <= 0x0 goto pc-2
> 3: 0......... (95) exit
> Global function syzbot_bug() doesn't return scalar. Only those are supported.
> 0: R1=ctx() R10=fp0
> ; asm volatile ( @ verifier_and.c:118
> 0: (b7) r0 = 100 ; R0_w=100
> 1: (7b) *(u64 *)(r10 -512) = r0 ; R0_w=100 R10=fp0 fp-512_w=100
> 2: (b5) if r0 <= 0x0 goto pc-2
> mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1
> mark_precise: frame0: regs=r0 stack= before 1: (7b) *(u64 *)(r10 -512) = r0
> mark_precise: frame0: regs=r0 stack= before 0: (b7) r0 = 100
> 2: R0_w=100
> 3: (95) exit
>
> from 2 to 1 (speculative execution): R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100
> 1: R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100
> 1: (7b) *(u64 *)(r10 -512) = r0
> verifier bug: scc exit: no visit info for call chain (1)
> processed 5 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> [...]
Here is what happens:
- Verification process starts and gets to instruction (2) w/o creating
any checkpoints.
- A speculative execution of the false branch is pushed onto states
stack; main execution process predicts the branch as false and
continues to exit. Still no checkpoints.
- Speculative execution branch is popped from stack and proceeds from
instruction (1).
- Speculative execution immediately terminates, because verifier
detects an infinite loop and signals an error.
- update_branch_counts() is called for speculative execution state and
its branches count reaches zero.
- update_branch_counts() -> maybe_exit_scc() is called for a state
with insn_idx in SCC #1.
- maybe_exit_scc() assumes that when it is called for a state with
insn_idx in some SCC, there should be an instance of struct
bpf_scc_visit allocated for this SCC, which is not the case here.
Why the assumption about bpf_scc_visit existence is made by
maybe_exit_scc()?
While performing non-speculative symbolic execution there are three
ways to terminate execution path:
a. Verification error is found. In this case update_branch_counts() is
not called and bpf_scc_visit existence does not matter.
b. Top level BPF_EXIT is reached. Exit instructions are never a part of
an SCC, so compute_scc_callchain() in maybe_scc_exit() will return
false and maybe_scc_exit() will return early.
c. A checkpoint is reached and matched. Checkpoints are created by
is_state_visited(), which calls maybe_enter_scc(), which allocates
bpf_scc_visit instances for checkpoints within SCCs.
Hence, for non-speculative symbolic execution paths there is no way to
reach a state when maybe_scc_exit() is called for a state within an
SCC, but bpf_scc_visit instance does not exist.
However, the above logic falls short for speculative symbolic
execution paths, because verification errors (option (a) above) lead
to update_branch_counts() calls. And the test case above demonstrates
exactly that scenario.
I'll send a patch disabling bpf_scc_visit existence assertion for
speculative paths in the morning. Something along the lines:
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1950,6 +1950,8 @@ static int maybe_exit_scc(struct bpf_verifier_env *env, struct bpf_verifier_stat
return 0;
visit = scc_visit_lookup(env, callchain);
if (!visit) {
+ if (st->speculative)
+ return 0;
verifier_bug(env, "scc exit: no visit info for call chain %s",
format_callchain(env, callchain));
return -EFAULT;
Powered by blists - more mailing lists