lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YV6Z0jcB8jdsG4Ly@krava>
Date:   Thu, 7 Oct 2021 08:55:14 +0200
From:   Jiri Olsa <jolsa@...hat.com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>,
        "Steven Rostedt (VMware)" <rostedt@...dmis.org>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>, Daniel Xu <dxu@...uu.xyz>,
        Viktor Malik <vmalik@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: [RFC] store function address in BTF

On Wed, Oct 06, 2021 at 03:37:16PM -0700, Andrii Nakryiko wrote:
> On Wed, Oct 6, 2021 at 3:05 PM Jiri Olsa <jolsa@...hat.com> wrote:
> >
> > On Wed, Oct 06, 2021 at 02:22:28PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Oct 6, 2021 at 1:06 PM Jiri Olsa <jolsa@...hat.com> wrote:
> > > >
> > > > On Wed, Oct 06, 2021 at 09:17:39AM -0700, Andrii Nakryiko wrote:
> > > > > On Wed, Oct 6, 2021 at 1:42 AM Jiri Olsa <jolsa@...hat.com> wrote:
> > > > > >
> > > > > > hi,
> > > > > > I'm hitting performance issue and soft lock ups with the new version
> > > > > > of the patchset and the reason seems to be kallsyms lookup that we
> > > > > > need to do for each btf id we want to attach
> > > > > >
> > > > > > I tried to change kallsyms_lookup_name linear search into rbtree search,
> > > > > > but it has its own pitfalls like duplicate function names and it still
> > > > > > seems not to be fast enough when you want to attach like 30k functions
> > > > >
> > > > > How not fast enough is it exactly? How long does it take?
> > > >
> > > > 30k functions takes 75 seconds for me, it's loop calling bpf_check_attach_target
> > > >
> > > > getting soft lock up messages:
> > > >
> > > > krava33 login: [  168.896671] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [bpftrace:1087]
> > > >
> > >
> > > That's without RB tree right? I was curious about the case of you
> > > converting kallsyms to RB tree and it still being slow. Can't imagine
> > > 30k queries against RB tree with ~160k kallsyms taking 75 seconds.
> >
> > yep, that's the standard kallsyms lookup api
> >
> > I need to make some adjustment for rbtree kalsyms code, I think I found
> > a bug in there, so the numbers are probably better as you suggest
> 
> ok, cool, let's see what are the new numbers then
> 
> >
> > >
> > > But as I said, why not map BTF IDs into function names, sort function
> > > names, and then pass over kallsyms once, doing binary search into a
> > > sorted array of requested function names and then recording addr for
> > > each. Then check that you found addresses for all functions (it also
> > > leaves a question of what to do when we have multiple matching
> > > functions, but it's a problem with any approach). If everything checks
> > > out, you have a nice btf id -> func name -> func addr mapping. It's
> > > O(N log(M)), which sounds like it shouldn't be slow. Definitely not
> > > multiple seconds slow.
> >
> > ok, now that's clear to me, thanks for these details
> 
> great
> 
> >
> > >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > so I wonder we could 'fix this' by storing function address in BTF,
> > > > > > which would cut kallsyms lookup completely, because it'd be done in
> > > > > > compile time
> > > > > >
> > > > > > my first thought was to add extra BTF section for that, after discussion
> > > > > > with Arnaldo perhaps we could be able to store extra 8 bytes after
> > > > > > BTF_KIND_FUNC record, using one of the 'unused' bits in btf_type to
> > > > > > indicate that? or new BTF_KIND_FUNC2 type?
> > > > > >
> > > > > > thoughts?
> > > > >
> > > > > I'm strongly against this, because (besides the BTF bloat reason) we
> > > > > need similar mass attachment functionality for kprobe/kretprobe and
> > > > > that one won't be relying on BTF FUNCs, so I think it's better to
> > > > > stick to the same mechanism for figuring out the address of the
> > > > > function.
> > > >
> > > > ok
> > > >
> > > > >
> > > > > If RB tree is not feasible, we can do a linear search over unsorted
> > > > > kallsyms and do binary search over sorted function names (derived from
> > > > > BTF IDs). That would be O(Nlog(M)), where N is number of ksyms, M is
> > > > > number of BTF IDs/functions-to-be-attached-to. If we did have an RB
> > > > > tree for kallsyms (is it hard to support duplicates? why?) it could be
> > > > > even faster O(Mlog(N)).
> > > >
> > > > I had issues with generic kallsyms rbtree in the post some time ago,
> > > > I'll revisit it to check on details.. but having the tree with just
> > > > btf id functions might clear that.. I'll check
> > >
> > > That's not what I'm proposing. See above. Please let me know if
> > > something is not clear before going all in for RB tree implementation
> > > :)
> > >
> > >
> > > But while we are on topic, do you think (with ftrace changes you are
> > > doing) it would be hard to support multi-attach for
> > > kprobes/kretprobes? We now have bpf_link interface for attaching
> > > kprobes, so API can be pretty aligned with fentry/fexit, except
> > > instead of btf IDs we'd need to pass array of pointers of C strings, I
> > > suppose.
> >
> > hum, I think kprobe/kretprobe is made of perf event (kprobe/kretprobe
> > pmus), then you pass event fd and program fd to bpf link syscall,
> > and it attaches bpf program to that perf event
> >
> > so perhaps the user interface would be array of perf events fds and prog fd
> >
> > also I think you can have just one probe for function, so we will not need
> > to share kprobes for multiple users like we need for trampolines, so the
> > attach logic will be simple
> 
> creating thousands of perf_event FDs seems expensive (and you'll be
> running into the limit of open files pretty soon in a lot of systems).
> So I think for multi-attach we'll have to have a separate way where
> you'd specify kernel function name (and maybe also offset). I'm just
> saying that having BPF_LINK_CREATE command, it's easier (probably) to
> extend this for kprobe multi-attach, than trying to retrofit this into
> perf_event_open.

ah true.. I wonder we could bypass perf by using directly kernel
kprobe interface

jirka

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ