[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c677db23-1680-6fcb-1629-0a93f60bf2c8@fb.com>
Date: Wed, 13 May 2020 08:28:56 -0700
From: Yonghong Song <yhs@...com>
To: "Daniel T. Lee" <danieltimlee@...il.com>
CC: Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Andrii Nakryiko <andrii.nakryiko@...il.com>,
John Fastabend <john.fastabend@...il.com>
Subject: Re: [PATCH bpf-next 1/3] samples: bpf: refactor kprobe tracing user
progs with libbpf
On 5/12/20 11:51 PM, Daniel T. Lee wrote:
> On Wed, May 13, 2020 at 10:40 AM Yonghong Song <yhs@...com> wrote:
>>
>>
>>
>> On 5/12/20 7:43 AM, Daniel T. Lee wrote:
>>> Currently, the kprobe BPF program attachment method for bpf_load is
>>> quite old. The implementation of bpf_load "directly" controls and
>>> manages(create, delete) the kprobe events of DEBUGFS. On the other hand,
>>> using using the libbpf automatically manages the kprobe event.
>>> (under bpf_link interface)
>>>
>>> By calling bpf_program__attach(_kprobe) in libbpf, the corresponding
>>> kprobe is created and the BPF program will be attached to this kprobe.
>>> To remove this, by simply invoking bpf_link__destroy will clean up the
>>> event.
>>>
>>> This commit refactors kprobe tracing programs (tracex{1~7}_user.c) with
>>> libbpf using bpf_link interface and bpf_program__attach.
>>>
>>> tracex2_kern.c, which tracks system calls (sys_*), has been modified to
>>> append prefix depending on architecture.
>>>
>>> Signed-off-by: Daniel T. Lee <danieltimlee@...il.com>
>>> ---
>>> samples/bpf/Makefile | 12 +++----
>>> samples/bpf/tracex1_user.c | 41 ++++++++++++++++++++----
>>> samples/bpf/tracex2_kern.c | 8 ++++-
>>> samples/bpf/tracex2_user.c | 55 ++++++++++++++++++++++++++------
>>> samples/bpf/tracex3_user.c | 65 ++++++++++++++++++++++++++++----------
>>> samples/bpf/tracex4_user.c | 55 +++++++++++++++++++++++++-------
>>> samples/bpf/tracex6_user.c | 53 +++++++++++++++++++++++++++----
>>> samples/bpf/tracex7_user.c | 43 ++++++++++++++++++++-----
>>> 8 files changed, 268 insertions(+), 64 deletions(-)
>>>
>>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
>>> index 424f6fe7ce38..4c91e5914329 100644
>>> --- a/samples/bpf/Makefile
>>> +++ b/samples/bpf/Makefile
>>> @@ -64,13 +64,13 @@ fds_example-objs := fds_example.o
>>> sockex1-objs := sockex1_user.o
>>> sockex2-objs := sockex2_user.o
>>> sockex3-objs := bpf_load.o sockex3_user.o
>>> -tracex1-objs := bpf_load.o tracex1_user.o $(TRACE_HELPERS)
>>> -tracex2-objs := bpf_load.o tracex2_user.o
>>> -tracex3-objs := bpf_load.o tracex3_user.o
>>> -tracex4-objs := bpf_load.o tracex4_user.o
>>> +tracex1-objs := tracex1_user.o $(TRACE_HELPERS)
>>> +tracex2-objs := tracex2_user.o
>>> +tracex3-objs := tracex3_user.o
>>> +tracex4-objs := tracex4_user.o
>>> tracex5-objs := bpf_load.o tracex5_user.o $(TRACE_HELPERS)
>>> -tracex6-objs := bpf_load.o tracex6_user.o
>>> -tracex7-objs := bpf_load.o tracex7_user.o
>>> +tracex6-objs := tracex6_user.o
>>> +tracex7-objs := tracex7_user.o
>>> test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o
>>> trace_output-objs := bpf_load.o trace_output_user.o $(TRACE_HELPERS)
>>> lathist-objs := bpf_load.o lathist_user.o
>>> diff --git a/samples/bpf/tracex1_user.c b/samples/bpf/tracex1_user.c
>>> index 55fddbd08702..1b15ab98f7d3 100644
>>> --- a/samples/bpf/tracex1_user.c
>>> +++ b/samples/bpf/tracex1_user.c
>>> @@ -1,21 +1,45 @@
>>> // SPDX-License-Identifier: GPL-2.0
>>> #include <stdio.h>
>>> -#include <linux/bpf.h>
>>> #include <unistd.h>
>>> -#include <bpf/bpf.h>
>>> -#include "bpf_load.h"
>>> +#include <bpf/libbpf.h>
>>> #include "trace_helpers.h"
>>>
>>> +#define __must_check
>>
>> This is not very user friendly.
>> Maybe not including linux/err.h and
>> use libbpf API libbpf_get_error() instead?
>>
>
> This approach looks more apparent and can stick with the libbpf API.
> I'll update code using this way.
>
>>> +#include <linux/err.h>
>>> +
>>> int main(int ac, char **argv)
>>> {
>>> - FILE *f;
>>> + struct bpf_link *link = NULL;
>>> + struct bpf_program *prog;
>>> + struct bpf_object *obj;
>>> char filename[256];
>>> + FILE *f;
>>>
>>> snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
>>> + obj = bpf_object__open_file(filename, NULL);
>>> + if (IS_ERR(obj)) {
>>> + fprintf(stderr, "ERROR: opening BPF object file failed\n");
>>> + obj = NULL;
>>> + goto cleanup;
>>
>> You do not need to goto cleanup, directly return 0 is okay here.
>> The same for other files in this patch.
>>
>
> As you said, it would be better to return right away than to proceed
> any further. I'll apply the code at next patch.
>
>>> + }
>>> +
>>> + prog = bpf_object__find_program_by_name(obj, "bpf_prog1");
>>> + if (!prog) {
>>> + fprintf(stderr, "ERROR: finding a prog in obj file failed\n");
>>> + goto cleanup;
>>> + }
>>> +
>>> + /* load BPF program */
>>> + if (bpf_object__load(obj)) {
>>> + fprintf(stderr, "ERROR: loading BPF object file failed\n");
>>> + goto cleanup;
>>> + }
>>>
>>> - if (load_bpf_file(filename)) {
>>> - printf("%s", bpf_log_buf);
>>> - return 1;
>>> + link = bpf_program__attach(prog);
>>> + if (IS_ERR(link)) {
>>> + fprintf(stderr, "ERROR: bpf_program__attach failed\n");
>>> + link = NULL;
>>> + goto cleanup;
>>> }
>>>
>>> f = popen("taskset 1 ping -c5 localhost", "r");
>>> @@ -23,5 +47,8 @@ int main(int ac, char **argv)
>>>
>>> read_trace_pipe();
>>>
>>> +cleanup:
>>> + bpf_link__destroy(link);
>>> + bpf_object__close(obj);
>>
>> Typically in kernel, we do multiple labels for such cases
>> like
>> destroy_link:
>> bpf_link__destroy(link);
>> close_object:
>> bpf_object__close(obj);
>>
>> The error path in the main() function jumps to proper label.
>> This is more clean and less confusion.
>>
>> The same for other cases in this file.
>>
>
> I totally agree that multiple labels are much more intuitive.
> But It's not very common to jump to the destroy_link label.
>
> Either when on the routine is completed successfully and jumps to the
> destroy_link branch, or an error occurred while bpf_program__attach
> was called "several" times and jumps to the destroy_link branch.
>
> Single bpf_program__attach like this tracex1 sample doesn't really have
> to destroy link, since the link has been set to NULL on attach error and
> bpf_link__destroy() is designed to do nothing if passed NULL to it.
>
> So I think current approach will keep consistent between samples since
> most of the sample won't need to jump to destroy_link.
Since this is the sample code, I won't enforce that. So yes, you can
keep your current approach.
>
>>> return 0;
>>> }
>>> diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c
>>> index d865bb309bcb..ff5d00916733 100644
>>> --- a/samples/bpf/tracex2_kern.c
>>> +++ b/samples/bpf/tracex2_kern.c
>>> @@ -11,6 +11,12 @@
>>> #include <bpf/bpf_helpers.h>
>>> #include <bpf/bpf_tracing.h>
>>>
>>> +#ifdef __x86_64__
>>> +#define SYSCALL "__x64_"
>>> +#else
>>> +#define SYSCALL
>>> +#endif
>>
>> See test_progs.h, one more case to handle:
>> #ifdef __x86_64__
>> #define SYS_NANOSLEEP_KPROBE_NAME "__x64_sys_nanosleep"
>> #elif defined(__s390x__)
>> #define SYS_NANOSLEEP_KPROBE_NAME "__s390x_sys_nanosleep"
>> #else
>> #define SYS_NANOSLEEP_KPROBE_NAME "sys_nanosleep"
>> #endif
>>
>
> That was also one of the considerations when writing patches.
> I'm planning to refactor most of the programs in the sample using
> libbpf, and found out that there are bunch of samples that tracks
> syscall with kprobe. Replacing all of them will take lots of macros
> and I thought using prefix will be better idea.
>
> Actually, my initial plan was to create macro of SYSCALL()
>
> #ifdef __x86_64__
> #define PREFIX "__x64_"
> #elif defined(__s390x__)
> #define PREFIX "__s390x_"
> #else
> #define PREFIX ""
> #endif
>
> #define SYSCALL(SYS) PREFIX ## SYS
>
> And to use this macro universally without creating additional headers,
> I was trying to add this to samples/bpf/syscall_nrs.c which later
> compiles to samples/bpf/syscall_nrs.h. But it was pretty hacky way and
> it won't work properly. So I ended up with just adding prefix to syscall.
I think it is okay to create a trace_common.h to have this definition
defined in one place and use them in bpf programs.
>
> Is it necessary to define all of the macro for each architecture?
Yes, if we define in trace_common.h, let us do for x64/x390x/others
similar to the above.
>
>>> +
>>> struct bpf_map_def SEC("maps") my_map = {
>>> .type = BPF_MAP_TYPE_HASH,
>>> .key_size = sizeof(long),
>>> @@ -77,7 +83,7 @@ struct bpf_map_def SEC("maps") my_hist_map = {
>>> .max_entries = 1024,
>>> };
>>>
>>> -SEC("kprobe/sys_write")
>>> +SEC("kprobe/" SYSCALL "sys_write")
>>> int bpf_prog3(struct pt_regs *ctx)
>>> {
>>> long write_size = PT_REGS_PARM3(ctx);
>> [...]
>
>
> Thank you for your time and effort for the review :)
>
> Best,
> Daniel
>
Powered by blists - more mailing lists