linux-kernel - Re: WARNING in __mark_chain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7137c191-49c6-7f57-95bc-ee15c7af3c28@meta.com>
Date:   Wed, 4 Jan 2023 08:51:38 -0800
From:   Yonghong Song <yhs@...a.com>
To:     Hao Sun <sunhao.th@...il.com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Stanislav Fomichev <sdf@...gle.com>,
        Andrii Nakryiko <andrii@...nel.org>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        John Fastabend <john.fastabend@...il.com>,
        Martin KaFai Lau <martin.lau@...ux.dev>,
        Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>, Hao Luo <haoluo@...gle.com>,
        Jiri Olsa <jolsa@...nel.org>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: WARNING in __mark_chain_precision



On 1/4/23 12:52 AM, Hao Sun wrote:
> 
> 
>> On 4 Jan 2023, at 1:47 PM, Yonghong Song <yhs@...a.com> wrote:
>>
>>
>>
>> On 1/3/23 10:27 AM, Alexei Starovoitov wrote:
>>> On Mon, Jan 2, 2023 at 1:42 AM Hao Sun <sunhao.th@...il.com> wrote:
>>>>
>>>>
>>>>
>>>> Yonghong Song <yhs@...a.com> 于2023年1月2日周一 03:20写道：
>>>>>
>>>>>
>>>>>
>>>>> On 12/30/22 1:44 AM, Hao Sun wrote:
>>>>>>
>>>>>>
>>>>>> Andrii Nakryiko <andrii.nakryiko@...il.com> 于2022年12月30日周五 06:16写道：
>>>>>>>
>>>>>>> On Tue, Dec 27, 2022 at 9:24 PM Yonghong Song <yhs@...a.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/20/22 4:30 PM, Andrii Nakryiko wrote:
>>>>>>>>> On Mon, Dec 19, 2022 at 11:13 AM <sdf@...gle.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 12/19, Hao Sun wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>> The following backtracking bug can be triggered on the latest bpf-next and
>>>>>>>>>>> Linux 6.1 with the C prog provided. I don't have enough knowledge about
>>>>>>>>>>> this part in the verifier, don't know how to fix this.
>>>>>>>>>>
>>>>>>>>>> Maybe something related to commit be2ef8161572 ("bpf: allow precision
>>>>>>>>>> tracking
>>>>>>>>>> for programs with subprogs") and/or the related ones?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> This can be reproduced on:
>>>>>>>>>>
>>>>>>>>>>> HEAD commit: 0e43662e61f2 tools/resolve_btfids: Use pkg-config to locate
>>>>>>>>>>> libelf
>>>>>>>>>>> git tree: bpf-next
>>>>>>>>>>> console log: https://pastebin.com/raw/45hZ7iqm
>>>>>>>>>>> kernel config: https://pastebin.com/raw/0pu1CHRm
>>>>>>>>>>> C reproducer: https://pastebin.com/raw/tqsiezvT
>>>>>>>>>>
>>>>>>>>>>> func#0 @0
>>>>>>>>>>> 0: R1=ctx(off=0,imm=0) R10=fp0
>>>>>>>>>>> 0: (18) r2 = 0x8000000000000          ; R2_w=2251799813685248
>>>>>>>>>>> 2: (18) r6 = 0xffff888027358000       ;
>>>>>>>>>>> R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
>>>>>>>>>>> 4: (18) r7 = 0xffff88802735a000       ;
>>>>>>>>>>> R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
>>>>>>>>>>> 6: (18) r8 = 0xffff88802735e000       ;
>>>>>>>>>>> R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
>>>>>>>>>>> 8: (18) r9 = 0x8e9700000000           ; R9_w=156779191205888
>>>>>>>>>>> 10: (36) if w9 >= 0xffffffe3 goto pc+1
>>>>>>>>>>> last_idx 10 first_idx 0
>>>>>>>>>>> regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
>>>>>>>>>>> 11: R9_w=156779191205888
>>>>>>>>>>> 11: (85) call #0
>>>>>>>>>>> 12: (cc) w2 s>>= w7
>>>>>>>>>
>>>>>>>>> w2 should have been set to NOT_INIT (because r1-r5 are clobbered by
>>>>>>>>> calls) and rejected here as !read_ok (see check_reg_arg()) before
>>>>>>>>> attempting to mark precision for r2. Can you please try to debug and
>>>>>>>>> understand why that didn't happen here?
>>>>>>>>
>>>>>>>> The verifier is doing the right thing here and the 'call #0' does
>>>>>>>> implicitly cleared r1-r5.
>>>>>>>>
>>>>>>>> So for 'w2 s>>= w7', since w2 is used, the verifier tries to find
>>>>>>>> its definition by backtracing. It encountered 'call #0', which clears
>>>>>>>
>>>>>>> and that's what I'm saying is incorrect. Normally we'd get !read_ok
>>>>>>> error because s>>= is both READ and WRITE on w2, which is
>>>>>>> uninitialized after call instruction according to BPF ABI. And that's
>>>>>>> what actually seems to happen correctly in my (simpler) tests locally.
>>>>>>> But something is special about this specific repro that somehow either
>>>>>>> bypasses this logic, or attempts to mark precision before we get to
>>>>>>> that test. That's what we should investigate. I haven't tried to run
>>>>>>> this specific repro locally yet, so can't tell for sure.
>>>>>>>
>>>>>>
>>>>>> So, the reason why w2 is not marked as uninit is that the kfunc call in
>>>>>> the BPF program is invalid, "call #0", imm is zero, right?
>>>>>
>>>>> Yes, "call #0" is invalid. As the code below
>>>>>
>>>>>> /* skip for now, but return error when we find this in
>>>>> fixup_kfunc_call */
>>>>>>   if (!insn->imm)
>>>>>>   return 0;
>>>>>
>>>>> The error report will be delayed later in fixup_kfunc_call().
>>>>>
>>>>> static int fixup_kfunc_call(struct bpf_verifier_env *env, struct
>>>>> bpf_insn *insn,
>>>>>                              struct bpf_insn *insn_buf, int insn_idx,
>>>>> int *cnt)
>>>>> {
>>>>>          const struct bpf_kfunc_desc *desc;
>>>>>
>>>>>          if (!insn->imm) {
>>>>>                  verbose(env, "invalid kernel function call not
>>>>> eliminated in verifier pass\n");
>>>>>                  return -EINVAL;
>>>>>          }
>>>>>
>>>>>
>>>>>> In check_kfunc_call(), it skips this error temporarily:
>>>>>>
>>>>>> /* skip for now, but return error when we find this in fixup_kfunc_call */
>>>>>>   if (!insn->imm)
>>>>>>   return 0;
>>>>>>
>>>>>> So the kfunc call is the previous instruction before "w2 s>>= w7", this
>>>>>> leads to the warning in backtrack_insn():
>>>>>>
>>>>>> /* regular helper call sets R0 */
>>>>>> *reg_mask &= ~1;
>>>>>> if (*reg_mask & 0x3f) {
>>>>>>       /* if backtracing was looking for registers R1-R5
>>>>>>       * they should have been found already.
>>>>>>       */
>>>>>>       verbose(env, "BUG regs %x\n", *reg_mask);
>>>>>>       WARN_ONCE(1, "verifier backtracking bug”);
>>>>>>       return -EFAULT;
>>>>>> }
>>>>>
>>>>> The main triggering the backtrack_insn() is due to
>>>>>
>>>>>                          } else {
>>>>>                                  /* scalar += pointer
>>>>>                                   * This is legal, but we have to
>>>>> reverse our
>>>>>                                   * src/dest handling in computing the range
>>>>>                                   */
>>>>>                                  err = mark_chain_precision(env,
>>>>> insn->dst_reg);
>>>>>                                  if (err)
>>>>>                                          return err;
>>>>>                                  return adjust_ptr_min_max_vals(env, insn,
>>>>>                                                                 src_reg,
>>>>> dst_reg);
>>>>>                          }
>>>>>
>>>>>
>>>>> unc#0 @0
>>>>> 0: R1=ctx(off=0,imm=0) R10=fp0
>>>>> 0: (18) r2 = 0x8000000000000          ; R2_w=2251799813685248
>>>>> 2: (18) r6 = 0xffff888100d29000       ;
>>>>> R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
>>>>> 4: (18) r7 = 0xffff888100d2a000       ;
>>>>> R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
>>>>> 6: (18) r8 = 0xffff888100d2ac00       ;
>>>>> R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
>>>>> 8: (18) r9 = 0x8e9700000000           ; R9_w=156779191205888
>>>>> 10: (36) if w9 >= 0xffffffe3 goto pc+1
>>>>> last_idx 10 first_idx 0
>>>>> regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
>>>>> 11: R9_w=156779191205888
>>>>> 11: (85) call #0
>>>>> 12: (cc) w2 s>>= w7
>>>>> last_idx 12 first_idx 12
>>>>> parent didn't have regs=4 stack=0 marks: R1=ctx(off=0,imm=0)
>>>>> R2_rw=P2251799813685248 R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
>>>>> R7_rw=map_ptr(off=0,ks=156,vs=2624,imm=0) R8_w=map_ptr(off=0,ks=2396,v0
>>>>> last_idx 11 first_idx 0
>>>>> regs=4 stack=0 before 11: (85) call #0
>>>>> BUG regs 4
>>>>>
>>>>> For insn 12, 'w2 s>>= w7', w2 is a scalar and w7 is a map_ptr. Hence,
>>>>> based on the above verifier code, mark_chain_precision() is triggered.
>>>>>
>>>>> Not sure what is the purpose of this test. But to make it succeed,
>>>>> first "call #0" need to change to a valid kfunc call, and second, you
>>>>> might want to change 'w2 s>>= w7' to e.g., 'w9 s>>= w7' to avoid
>>>>> precision tracking.
>>>>>
>>>>
>>>> The purpose is not to make the test "succeed", the verifier temporarily
>>>> skips the invalid kfunc insn "call #0", but this insn triggered a warning
>>>> in backtrack_insn(), while it is supposed to reject the program either
>>>> due to insn#12 32bit ptr alu or insn#11 invalid kfunc.
>>>>
>>>> Maybe something like the bellow, after applying the patch, the reproducer
>>>> is rejected:
>>>>
>>>> func#0 @0
>>>> 0: R1=ctx(off=0,imm=0) R10=fp0
>>>> 0: (18) r2 = 0x8000000000000          ; R2_w=2251799813685248
>>>> 2: (18) r6 = 0xffff88817d563000       ; R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
>>>> 4: (18) r7 = 0xffff888171ee9000       ; R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
>>>> 6: (18) r8 = 0xffff888171ee8000       ; R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
>>>> 8: (18) r9 = 0x8e9700000000           ; R9_w=156779191205888
>>>> 10: (36) if w9 >= 0xffffffe3 goto pc+1
>>>> last_idx 10 first_idx 0
>>>> regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
>>>> 11: R9_w=156779191205888
>>>> 11: (85) call #0
>>>> 12: (cc) w2 s>>= w7
>>>> last_idx 12 first_idx 12
>>>> parent didn't have regs=4 stack=0 marks: R1=ctx(off=0,imm=0) R2_rw=P2251799813685248 R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0) R7_rw=map_ptr(off=0,ks=156,vs=2624,imm=0) R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0) R9_w=156779191205888 R10=fp0
>>>> last_idx 11 first_idx 0
>>>> regs=4 stack=0 before 11: (85) call #0
>>>> regs=4 stack=0 before 10: (36) if w9 >= 0xffffffe3 goto pc+1
>>>> regs=4 stack=0 before 8: (18) r9 = 0x8e9700000000
>>>> regs=4 stack=0 before 6: (18) r8 = 0xffff888171ee8000
>>>> regs=4 stack=0 before 4: (18) r7 = 0xffff888171ee9000
>>>> regs=4 stack=0 before 2: (18) r6 = 0xffff88817d563000
>>>> regs=4 stack=0 before 0: (18) r2 = 0x8000000000000
>>>> R2 32-bit pointer arithmetic prohibited
>>>> processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 4a25375ebb0d..abc7e96d826f 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -2743,6 +2743,9 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx,
>>>>                          *reg_mask |= sreg;
>>>>          } else if (class == BPF_JMP || class == BPF_JMP32) {
>>>>                  if (opcode == BPF_CALL) {
>>>> +                       /* skip for now, should return error when we find this in fixup_kfunc_call */
>>>> +                       if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL && insn->imm == 0)
>>>> +                               return 0;
>>> Makes sense to me. Please submit as an official patch
>>> with s/return 0/return -ENOTSUPP/
>>> Also 'skip for now' isn't quite correct here.
>>> In check_kfunc_call() it's correct, since invalid kfunc with imm==0
>>> could be eliminated during dead code elimination,
>>> but since we're walking this insn here in backtrack_insn
>>> the dead code elimination is not going to kick in.
>>> So it's surely invalid kfunc call if we see it in backtrack_insn.
>>> The comment should probably be something like:
>>> /* kfunc with imm==0 is invalid and fixup_kfunc_call will catch
>>> this error later. Make backtracking conservative with ENOTSUPP. */
>>
>> Do we have the same issue if we have
>>    call #1 <or some valid kfunc>
>> instead of
>>    call #0
>> ?
> 
> Seems no. If that happened, then, it’s another bug. Because as Andrii commented,
> the prog should be rejected due to !read_ok before backing track to the kfunc
> call. In this particular case, the invalid kfunc call is skipped without marking
> Regs as uninitiated so the verifier incorrectly backtrack to that invalid call.

Make sense. Thanks for double checking!