linux-kernel - Re: [syzbot] [bpf?] WARNING in maybe_exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c42dd869d9ba23f14681448581a9c8c7ec23105b.camel@gmail.com>
Date: Tue, 16 Sep 2025 02:14:47 -0700
From: Eduard Zingerman <eddyz87@...il.com>
To: syzbot <syzbot+3afc814e8df1af64b653@...kaller.appspotmail.com>, 
	andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
 daniel@...earbox.net, 	haoluo@...gle.com, john.fastabend@...il.com,
 jolsa@...nel.org, kpsingh@...nel.org, 	linux-kernel@...r.kernel.org,
 martin.lau@...ux.dev, sdf@...ichev.me, 	song@...nel.org,
 syzkaller-bugs@...glegroups.com, yonghong.song@...ux.dev
Subject: Re: [syzbot] [bpf?] WARNING in maybe_exit_scc

On Mon, 2025-09-15 at 16:40 -0700, Eduard Zingerman wrote:
> On Mon, 2025-09-15 at 15:34 -0700, Eduard Zingerman wrote:
> 
> [...]
> 
> > > verifier bug: scc exit: no visit info for call chain (1)(1)
> > > WARNING: CPU: 1 PID: 6013 at kernel/bpf/verifier.c:1949 maybe_exit_scc+0x768/0x8d0 kernel/bpf/verifier.c:1949
> > 
> > Both this and [1] are reported for very similar programs:
> > 
> > <this>                                      <[1]>
> > --------------------------------------------------------------------------------------------
> > (b7) r0 = -1023213567                       (b7) r0 = -1023213567
> > (bf) r3 = r10				    (bf) r3 = r10
> > (07) r3 += -512				    (07) r3 += -504
> > (72) *(u8 *)(r10 -16) = -8		    (72) *(u8 *)(r10 -16) = -8
> > (71) r4 = *(u8 *)(r10 -16)		    (71) r4 = *(u8 *)(r10 -16)
> > (65) if r4 s> 0xff000000 goto pc+2	    (65) if r4 s> 0xff000000 goto pc+2
> > (2d) if r0 > r4 goto pc+5		    (2d) if r0 > r4 goto pc+5
> > (20) r0 = *(u32 *)skb[60673]		    (20) r0 = *(u32 *)skb[60673]
> > (7b) *(u64 *)(r3 +0) = r0		    (7b) *(u64 *)(r3 +0) = r0
> > (1d) if r4 == r4 goto pc+0		    (1d) if r4 == r4 goto pc+0
> > (7a) *(u64 *)(r10 -512) = -256		    (7a) *(u64 *)(r10 -512) = -256
> > (db) lock *(u64 *)(r3 +0) |= r0		    (db) r0 = atomic64_fetch_and((u64 *)(r3 +0), r0)
> > (b5) if r0 <= 0x0 goto pc-2		    (b5) if r0 <= 0x0 goto pc-2
> > (95) exit				    (95) exit
> > 
> > So, I assume it's the same issue. Looking into it.
> > 
> > [1] https://lore.kernel.org/bpf/68c85b0d.050a0220.2ff435.03a5.GAE@google.com/T/#u
> 
> Minimal reproducer:
> 
>   SEC("socket")
>   __caps_unpriv(CAP_BPF)
>   __naked void syzbot_bug(void)
>   {
>         asm volatile (
>         "r0 = 100;"
>   "1:"
>         "*(u64 *)(r10 - 512) = r0;"
>         "if r0 <= 0x0 goto 1b;"
>         "exit;"
>         ::: __clobber_all);
>   }
> 
> And corresponding verifier log:
> 
>   Live regs before insn:
>         0: .......... (b7) r0 = 100
>     1   1: 0......... (7b) *(u64 *)(r10 -512) = r0
>     1   2: 0......... (b5) if r0 <= 0x0 goto pc-2
>         3: 0......... (95) exit
>   Global function syzbot_bug() doesn't return scalar. Only those are supported.
>   0: R1=ctx() R10=fp0
>   ; asm volatile ( @ verifier_and.c:118
>   0: (b7) r0 = 100                      ; R0_w=100
>   1: (7b) *(u64 *)(r10 -512) = r0       ; R0_w=100 R10=fp0 fp-512_w=100
>   2: (b5) if r0 <= 0x0 goto pc-2
>   mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1
>   mark_precise: frame0: regs=r0 stack= before 1: (7b) *(u64 *)(r10 -512) = r0
>   mark_precise: frame0: regs=r0 stack= before 0: (b7) r0 = 100
>   2: R0_w=100
>   3: (95) exit
> 
>   from 2 to 1 (speculative execution): R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100
>   1: R0_w=scalar() R1=ctx() R10=fp0 fp-512_w=100
>   1: (7b) *(u64 *)(r10 -512) = r0
>   verifier bug: scc exit: no visit info for call chain (1)
>   processed 5 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> 
> [...]

Here is what happens:
- Verification process starts and gets to instruction (2) w/o creating
  any checkpoints.
- A speculative execution of the false branch is pushed onto states
  stack; main execution process predicts the branch as false and
  continues to exit. Still no checkpoints.
- Speculative execution branch is popped from stack and proceeds from
  instruction (1).
- Speculative execution immediately terminates, because verifier
  detects an infinite loop and signals an error.
- update_branch_counts() is called for speculative execution state and
  its branches count reaches zero.
- update_branch_counts() -> maybe_exit_scc() is called for a state
  with insn_idx in SCC #1.
- maybe_exit_scc() assumes that when it is called for a state with
  insn_idx in some SCC, there should be an instance of struct
  bpf_scc_visit allocated for this SCC, which is not the case here.
  
Why the assumption about bpf_scc_visit existence is made by
maybe_exit_scc()?
While performing non-speculative symbolic execution there are three
ways to terminate execution path:
a. Verification error is found. In this case update_branch_counts() is
   not called and bpf_scc_visit existence does not matter.
b. Top level BPF_EXIT is reached. Exit instructions are never a part of
   an SCC, so compute_scc_callchain() in maybe_scc_exit() will return
   false and maybe_scc_exit() will return early.
c. A checkpoint is reached and matched. Checkpoints are created by
   is_state_visited(), which calls maybe_enter_scc(), which allocates
   bpf_scc_visit instances for checkpoints within SCCs.

Hence, for non-speculative symbolic execution paths there is no way to
reach a state when maybe_scc_exit() is called for a state within an
SCC, but bpf_scc_visit instance does not exist.

However, the above logic falls short for speculative symbolic
execution paths, because verification errors (option (a) above) lead
to update_branch_counts() calls. And the test case above demonstrates
exactly that scenario.

I'll send a patch disabling bpf_scc_visit existence assertion for
speculative paths in the morning. Something along the lines:

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1950,6 +1950,8 @@ static int maybe_exit_scc(struct bpf_verifier_env *env, struct bpf_verifier_stat
                return 0;
        visit = scc_visit_lookup(env, callchain);
        if (!visit) {
+               if (st->speculative)
+                       return 0;
                verifier_bug(env, "scc exit: no visit info for call chain %s",
                             format_callchain(env, callchain));
                return -EFAULT;