linux-kernel - Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <24a63d26171a49fa110fa7fff6d70f9e2b61a2fb.camel@gmail.com>
Date: Tue, 08 Jul 2025 10:39:00 -0700
From: Eduard Zingerman <eddyz87@...il.com>
To: Paul Chaignon <paul.chaignon@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, syzbot	
 <syzbot+c711ce17dd78e5d4fdcf@...kaller.appspotmail.com>, Andrii Nakryiko	
 <andrii@...nel.org>, Alexei Starovoitov <ast@...nel.org>, bpf	
 <bpf@...r.kernel.org>, Daniel Borkmann <daniel@...earbox.net>, Hao Luo	
 <haoluo@...gle.com>, John Fastabend <john.fastabend@...il.com>, Jiri Olsa	
 <jolsa@...nel.org>, KP Singh <kpsingh@...nel.org>, LKML	
 <linux-kernel@...r.kernel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, 
 Network Development <netdev@...r.kernel.org>, Stanislav Fomichev
 <sdf@...ichev.me>, Song Liu <song@...nel.org>,  syzkaller-bugs
 <syzkaller-bugs@...glegroups.com>, Yonghong Song <yonghong.song@...ux.dev>
Subject: Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check

On Tue, 2025-07-08 at 18:19 +0200, Paul Chaignon wrote:
> On Mon, Jul 07, 2025 at 05:57:32PM -0700, Eduard Zingerman wrote:
> > On Mon, 2025-07-07 at 17:51 -0700, Alexei Starovoitov wrote:
> > > On Mon, Jul 7, 2025 at 5:37 PM Eduard Zingerman <eddyz87@...il.com> wrote:
> > > > 
> > > > On Mon, 2025-07-07 at 16:29 -0700, Eduard Zingerman wrote:
> > > > > On Tue, 2025-07-08 at 00:30 +0200, Paul Chaignon wrote:
> 
> [...]
> 
> > > > But I think the program below would still be problematic:
> > > > 
> > > > SEC("socket")
> > > > __success
> > > > __retval(0)
> > > > __naked void jset_bug1(void)
> > > > {
> > > >         asm volatile ("                                 \
> > > >         call %[bpf_get_prandom_u32];                    \
> > > >         if r0 < 2 goto 1f;                              \
> > > >         r0 |= 1;                                        \
> > > >         if r0 & -2 goto 1f;                             \
> > > > 1:      r0 = 0;                                         \
> > > >         exit;                                           \
> > > > "       :
> > > >         : __imm(bpf_get_prandom_u32)
> > > >         : __clobber_all);
> > > > }
> > > > 
> > > > The possible_r0 would be changed by `if r0 & -2`, so new rule will not hit.
> > > > And the problem remains unsolved. I think we need to reset min/max
> > > > bounds in regs_refine_cond_op for JSET:
> > > > - in some cases range is more precise than tnum
> > > > - in these cases range cannot be compressed to a tnum
> > > > - predictions in jset are done for a tnum
> > > > - to avoid issues when narrowing tnum after prediction, forget the
> > > >   range.
> > > 
> > > You're digging too deep. llvm doesn't generate JSET insn,
> > > so this is syzbot only issue. Let's address it with minimal changes.
> > > Do not introduce fancy branch taken analysis.
> > > I would be fine with reverting this particular verifier_bug() hunk.
> 
> Ok, if LLVM doesn't generate JSETs, I agree there's not much point
> trying to reduce false positives. I like Eduard's solution below
> because it handles the JSET case without removing the warning. Given
> the number of crashes syzkaller is generating, I suspect this isn't
> only about JSET, so it'd be good to keep some visibility into invariant
> violations.

I suspect similar problems might be found in any place where tnum
operations are used to narrow the range. E.g. if a repro for JSET
would be found, same repro might be applicable to BPF_AND.

In general, it might be the case we should not treat out of sync
bounds as an error. Assuming that tnum and bounds based ranges have
different precision in different scale regions, situations when
one bound is changed w/o changing another can be legit. E.g.:

                              ____ bounds range ____
                             /                      \
0 --------------------------------------------------------- MAX
    \___________________________________________________/
          tnum range

Narrowing only tnum:
                              ____ bounds range ____
                             /                      \
0 --------------------------------------------------------- MAX
    \___________________/
          tnum range

This does not highlight an error, but a difference in expressive power
for specific values.

> > My point is that the fix should look as below (but extract it as a
> > utility function):
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 53007182b46b..b2fe665901b7 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -16207,6 +16207,14 @@ static void regs_refine_cond_op(struct bpf_reg_state *reg1, struct bpf_reg_state
> >                         swap(reg1, reg2);
> >                 if (!is_reg_const(reg2, is_jmp32))
> >                         break;
> > +               reg1->u32_max_value = U32_MAX;
> > +               reg1->u32_min_value = 0;
> > +               reg1->s32_max_value = S32_MAX;
> > +               reg1->s32_min_value = S32_MIN;
> > +               reg1->umax_value = U64_MAX;
> > +               reg1->umin_value = 0;
> > +               reg1->smax_value = S64_MAX;
> > +               reg1->smin_value = S32_MIN;
> 
> Looks like __mark_reg_unbounded :)

I suspected there should be something already :)

> I can send a test case + __mark_reg_unbounded for BPF_JSET | BPF_X in
> regs_refine_cond_op. I suspect we may need the same for the BPF_JSET
> case as well, but I'm unable to build a repro for that so far.

Please go ahead.

> 
> >                 val = reg_const_value(reg2, is_jmp32);
> >                 if (is_jmp32) {
> >                         t = tnum_and(tnum_subreg(reg1->var_off), tnum_const(~val));
> > 
> > ----
> > 
> > Because of irreconcilable differences in what can be represented as a
> > tnum and what can be represented as a range.