lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzbu3zuDcPj3ue8D6VCdMTw2PEREJBU42CbR1Pe=5qOrTQ@mail.gmail.com>
Date:   Thu, 28 Apr 2022 16:32:20 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Jiri Olsa <jolsa@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>
Subject: Re: [RFC bpf-next 4/4] selftests/bpf: Add attach bench test

On Thu, Apr 28, 2022 at 1:05 PM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Thu, 28 Apr 2022 11:59:55 -0700
> Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
>
> > > The weak function gets a call to ftrace, but it still gets compiled into
> > > vmlinux but its symbol is dropped due to it being overridden. Thus, the
> > > mcount_loc finds this call to fentry, and maps it to the symbol that is
> > > before it, which just happened to be __bpf_tramp_exit.
> >
> > Ouch. That _is_ a bug in recordmocount.
>
> Exactly HOW is it a bug in recordmcount?
>
> The job of recordmcount is to create a section of all the locations that
> call fentry. That is EXACTLY what it did. No bug there! It did its job.

But that __fentry__ call is not part of __bpf_tramp_exit, actually.
Whether to call it a bug or limitation is secondary. It marks
__bpf_tramp_exit as attachable through kprobe/ftrace while it really
isn't.

Below you are saying there is only user confusion. It's not just
confusion. You'll get an error when you try to attach to
__bpf_tramp_exit because __bpf_tramp_exit doesn't really have
__fentry__ preamble and thus the kernel itself will reject it as a
target. So when you build a generic tracing tool that fetches all the
attachable kprobes, filters out all the blacklisted ones, you still
end up with kprobe targets that are not attachable. It's definitely
more than an inconvenience which I experienced first hand.

Can recordmcount or whoever does this be taught to use proper FUNC
symbol size to figure out boundaries of the function?

$ readelf -s ~/linux-build/default/vmlinux | rg __bpf_tramp_exit
129408: ffffffff811b2ba0    63 FUNC    GLOBAL DEFAULT    1 __bpf_tramp_exit

So only the first 63 bytes of instruction after __bpf_tramp_exit
should be taken into account. Everything else doesn't belong to
__bpf_tramp_exit. So even though objdump pretends that call __fentry__
is part of __bpf_tramp_exit, it's not.

ffffffff811b2ba0 <__bpf_tramp_exit>:
ffffffff811b2ba0:       53                      push   %rbx
ffffffff811b2ba1:       48 89 fb                mov    %rdi,%rbx
ffffffff811b2ba4:       e8 97 d2 f2 ff          call
ffffffff810dfe40 <__rcu_read_lock>
ffffffff811b2ba9:       48 8b 83 e0 00 00 00    mov    0xe0(%rbx),%rax
ffffffff811b2bb0:       a8 03                   test   $0x3,%al
ffffffff811b2bb2:       75 0a                   jne
ffffffff811b2bbe <__bpf_tramp_exit+0x1e>
ffffffff811b2bb4:       65 48 ff 08             decq   %gs:(%rax)
ffffffff811b2bb8:       5b                      pop    %rbx
ffffffff811b2bb9:       e9 d2 0e f3 ff          jmp
ffffffff810e3a90 <__rcu_read_unlock>
ffffffff811b2bbe:       48 8b 83 e8 00 00 00    mov    0xe8(%rbx),%rax
ffffffff811b2bc5:       f0 48 83 28 01          lock subq $0x1,(%rax)
ffffffff811b2bca:       75 ec                   jne
ffffffff811b2bb8 <__bpf_tramp_exit+0x18>
ffffffff811b2bcc:       48 8b 83 e8 00 00 00    mov    0xe8(%rbx),%rax
ffffffff811b2bd3:       48 8d bb e0 00 00 00    lea    0xe0(%rbx),%rdi
ffffffff811b2bda:       ff 50 08                call   *0x8(%rax)
ffffffff811b2bdd:       eb d9                   jmp
ffffffff811b2bb8 <__bpf_tramp_exit+0x18>
ffffffff811b2bdf:       90                      nop

^^^ ffffffff811b2ba0 + 63 = ffffffff811b2bdf -- this is the end of
__bpf_tramp_exit

ffffffff811b2be0:       e8 3b 9c e9 ff          call
ffffffff8104c820 <__fentry__>
ffffffff811b2be5:       b8 f4 fd ff ff          mov    $0xfffffdf4,%eax
ffffffff811b2bea:       c3                      ret
ffffffff811b2beb:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)


>
> In fact, recordmcount probably didn't even get called. If you see this on
> x86 with gcc version greater than 8 (which I do), recordmcount is not even
> used. gcc creates this section internally instead.
>
> >
> > > I made that weak function "notrace" and the __bpf_tramp_exit disappeared
> > > from the available_filter_functions list.
> >
> > That's a hack. We cannot rely on such hacks for all weak functions.
>
> Then don't do anything. The only thing this bug causes is perhaps some
> confusion, because functions before weak functions that are overridden will
> be listed incorrectly in the available_filter_functions file. And that's
> because of the way it is created with respect to kallsyms.
>
> If you enable __bpf_tramp_exit, it will not do anything to that function.
> What it will do is enable the location inside of the weak function that no
> longer has its symbol shown.
>
> One solution is to simply get the end of the function that is provided by
> kallsyms to make sure the fentry call location is inside the function, and
> if it is not, then not show that function in available_filter_functions but
> instead show something like "** unnamed function **" or whatever.
>
> I could write a patch to do that when I get the time. But because the only
> issue that this causes is some confusion among the users and does not cause
> any issue with functionality, then it is low priority.
>
> -- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ