netdev - Re: [bpf-next PATCH] bpf: libbpf, support older style kprobe load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bza7Sz-Rk=9fD70bKvUY=_BtbTUoyobkHB=pLqQijgVkbA@mail.gmail.com>
Date:   Tue, 22 Oct 2019 09:40:22 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Andrii Nakryiko <andriin@...com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [bpf-next PATCH] bpf: libbpf, support older style kprobe load

On Mon, Oct 21, 2019 at 10:08 PM John Fastabend
<john.fastabend@...il.com> wrote:
>
> Andrii Nakryiko wrote:
> > On Sat, Oct 19, 2019 at 1:30 AM John Fastabend <john.fastabend@...il.com> wrote:
> > >
> > > Following ./Documentation/trace/kprobetrace.rst add support for loading
> > > kprobes programs on older kernels.
> > >
> >
> > My main concern with this is that this code is born bit-rotten,
> > because selftests are never testing the legacy code path. How did you
> > think about testing this and ensuring that this keeps working going
> > forward?
> >
>
> Well we use it, but I see your point and actually I even broke the retprobe
> piece hastily fixing merge conflicts in this patch. When I ran tests on it
> I missed running retprobe tests on the set of kernels that would hit that
> code.
>
> > > Signed-off-by: John Fastabend <john.fastabend@...il.com>
> > > ---
> > >  tools/lib/bpf/libbpf.c |   81 +++++++++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 73 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index fcea6988f962..12b3105d112c 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -5005,20 +5005,89 @@ static int determine_uprobe_retprobe_bit(void)
> > >         return parse_uint_from_file(file, "config:%d\n");
> > >  }
> > >
> > > +static int use_kprobe_debugfs(const char *name,
> > > +                             uint64_t offset, int pid, bool retprobe)
> > > +{
> > > +       const char *file = "/sys/kernel/debug/tracing/kprobe_events";
> > > +       int fd = open(file, O_WRONLY | O_APPEND, 0);
> > > +       char buf[PATH_MAX];
> > > +       int err;
> > > +
> > > +       if (fd < 0) {
> > > +               pr_warning("failed open kprobe_events: %s\n",
> > > +                          strerror(errno));
> > > +               return -errno;
> >
> > errno after pr_warning() call might be clobbered, you need to save it
> > locally first
>
> Seems so thanks.
>
> >
> > > +       }
> > > +
> > > +       snprintf(buf, sizeof(buf), "%c:kprobes/%s %s",
> > > +                retprobe ? 'r' : 'p', name, name);
> >
> > remember result, check it to detect overflow, and use it in write below?
>
> sure seems more robust. If someone has names longer than PATH_MAX though
> it seems a bit much.
>
> >
> > > +
> > > +       err = write(fd, buf, strlen(buf));
> > > +       close(fd);
> > > +       if (err < 0)
> > > +               return -errno;
> > > +       return 0;
> > > +}
> > > +
> > >  static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
> > >                                  uint64_t offset, int pid)
> > >  {
> > >         struct perf_event_attr attr = {};
> > >         char errmsg[STRERR_BUFSIZE];
> > > +       uint64_t config1 = 0;
> > >         int type, pfd, err;
> > >
> > >         type = uprobe ? determine_uprobe_perf_type()
> > >                       : determine_kprobe_perf_type();
> > >         if (type < 0) {
> > > -               pr_warning("failed to determine %s perf type: %s\n",
> > > -                          uprobe ? "uprobe" : "kprobe",
> > > -                          libbpf_strerror_r(type, errmsg, sizeof(errmsg)));
> > > -               return type;
> > > +               if (uprobe) {
> > > +                       pr_warning("failed to determine uprobe perf type %s: %s\n",
> > > +                                  name,
> > > +                                  libbpf_strerror_r(type,
> > > +                                                    errmsg, sizeof(errmsg)));
> > > +               } else {
> >
> > I think this legacy kprobe setup deserves its own function (maybe even
> > combine it with use_kprobe_debugfs to do entire attribute setup from A
> > to Z?)
> >
> > These 2 levels of nestedness is also unfortunate, how about having two
> > functions that are filling out perf_event_attr? Something like:
> >
> > err = determine_perf_probe_attr(...)
> > if (err)
> >     err = determine_legacy_probe_attr(...)
> > if (!err)
> >     <bail out>
> > do perf call here
> >
>
> Perhaps it makes sense to even uplevel this into the API? Something like
> bpf_program__attach_legacy_kprobe() then we could test it easer?

Having this as a separate API is bad from usability standpoint,
because we force user to either know which kernel versions support new
vs old ways, or have to write a "try this, if fails - try that" logic,
which is ugly. So I think we should hide this from user.

I'm still thinking what's the least intrusive way to cause new way to
fail on modern kernels, so that we can guarantee that legacy code path
is executed. Some way of forcing determine_kprobe_perf_type() to fail
would be easiest.

>
> >
> > > +                       /* If we do not have an event_source/../kprobes then we
> > > +                        * can try to use kprobe-base event tracing, for details
> > > +                        * see ./Documentation/trace/kprobetrace.rst
> > > +                        */
> > > +                       const char *file = "/sys/kernel/debug/tracing/events/kprobes/";
> > > +                       char c[PATH_MAX];
> >
> > what does c stand for?
>
> Can name it file and push the path into snprintf() below.

sounds good

>
> >
> > > +                       int fd, n;
> > > +
> > > +                       snprintf(c, sizeof(c), "%s/%s/id", file, name);
> >
> > check result? also, is there a reason to not use
> > "/sys/kernel/debug/tracing/events/kprobes/" directly in format string?
> >
>
> I believe when I wrote that I just like the it as a const char better.
> But no reason if you like it inline better.
>
> > > +
> > > +                       err = use_kprobe_debugfs(name, offset, pid, retprobe);
> > > +                       if (err)
> > > +                               return err;
> > > +
> > > +                       type = PERF_TYPE_TRACEPOINT;
> > > +                       fd = open(c, O_RDONLY, 0);
> > > +                       if (fd < 0) {
> > > +                               pr_warning("failed to open tracepoint %s: %s\n",
> > > +                                          c, strerror(errno));
> > > +                               return -errno;
> > > +                       }
> > > +                       n = read(fd, c, sizeof(c));
> > > +                       close(fd);
> > > +                       if (n < 0) {
> > > +                               pr_warning("failed to read %s: %s\n",
> > > +                                          c, strerror(errno));
> >
> > It's a bit fishy that you are reading into c and then print out c on
> > error. Its contents might be corrupted at this point.
> >
> > And same thing about errno as above.
>
> Sure just didn't see much point in using yet another buffer. We can
> just make the pr_warning general enough that the buffer isn't needed.

just use parse_uint_from_file(), as long as we don't expect to read
negative numbers, it will do what you want (and that's what we already
use for new-style kprobe/uprobe).

>
> >
> > > +                               return -errno;
> > > +                       }
> > > +                       c[n] = '\0';
> > > +                       config1 = strtol(c, NULL, 0);
> >
> > no need for config1 variable, just attr.config = strtol(...)?
>
> yes, fallout from some code churn. But no reason for it.
>
> >
> > > +                       attr.size = sizeof(attr);
> > > +                       attr.type = type;
> > > +                       attr.config = config1;
> > > +                       attr.sample_period = 1;
> > > +                       attr.wakeup_events = 1;
> > > +               }
> > > +       } else {
> > > +               config1 = ptr_to_u64(name);
> >
> > same, just straight attr.config1 = ... ?
>
> yes.
>
> >
> > > +               attr.size = sizeof(attr);
> > > +               attr.type = type;
> > > +               attr.config1 = config1; /* kprobe_func or uprobe_path */
> > > +               attr.config2 = offset;  /* kprobe_addr or probe_offset */
> > >         }
> > >         if (retprobe) {
> > >                 int bit = uprobe ? determine_uprobe_retprobe_bit()
> > > @@ -5033,10 +5102,6 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
> > >                 }
> > >                 attr.config |= 1 << bit;
> > >         }
> > > -       attr.size = sizeof(attr);
> > > -       attr.type = type;
> > > -       attr.config1 = ptr_to_u64(name); /* kprobe_func or uprobe_path */
> > > -       attr.config2 = offset;           /* kprobe_addr or probe_offset */
> > >
> > >         /* pid filter is meaningful only for uprobes */
> > >         pfd = syscall(__NR_perf_event_open, &attr,
> > >
> >
> > What about the detaching? Would closing perf event FD be enough?
> > Wouldn't we need to clear a probe with -:<event>?
>
> It seems to be enough to close the fd. From an API standpoint might
> make sense to have a detach() though if adding an attach().

we do have detach, it's bpf_link__destroy()

BCC certainly deletes kprobe:
https://github.com/iovisor/bcc/blob/master/src/cc/libbpf.c#L1142

BTW, if we are adding legacy stuff, we should probably also add uprobe
support, no? See BCC for inspiration:
https://github.com/iovisor/bcc/blob/master/src/cc/libbpf.c#L975

One thing that's not clear, and maybe Yonghong can clarify, is why do
we need to enter mount_ns. I vaguely remember this has problems for
multi-threaded applications. This aspect libbpf can't and shouldn't
control, so I wonder if we can avoid mount_ns?..