netdev - Re: [PATCH bpf-next 1/4] samples: bpf: fix bpf programs with kprobe/sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzYx8dT3nFx69-oXXqmwBXia62bTbjG3Nb9X7vz=OxefFg@mail.gmail.com>
Date:   Mon, 6 Jul 2020 16:49:52 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     "Daniel T. Lee" <danieltimlee@...il.com>
Cc:     Yonghong Song <yhs@...com>, Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Andrii Nakryiko <andriin@...com>,
        netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next 1/4] samples: bpf: fix bpf programs with
 kprobe/sys_connect event

On Mon, Jul 6, 2020 at 3:28 AM Daniel T. Lee <danieltimlee@...il.com> wrote:
>
> On Fri, Jul 3, 2020 at 1:04 AM Yonghong Song <yhs@...com> wrote:
> >
> >
> >
> > On 7/2/20 4:13 AM, Daniel T. Lee wrote:
> > > On Thu, Jul 2, 2020 at 2:13 PM Yonghong Song <yhs@...com> wrote:
> > >>
> > >>
> > >>
> > >> On 7/1/20 7:16 PM, Daniel T. Lee wrote:
> > >>> Currently, BPF programs with kprobe/sys_connect does not work properly.
> > >>>
> > >>> Commit 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > >>> This commit modifies the bpf_load behavior of kprobe events in the x64
> > >>> architecture. If the current kprobe event target starts with "sys_*",
> > >>> add the prefix "__x64_" to the front of the event.
> > >>>
> > >>> Appending "__x64_" prefix with kprobe/sys_* event was appropriate as a
> > >>> solution to most of the problems caused by the commit below.
> > >>>
> > >>>       commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct
> > >>>       pt_regs-based sys_*() to __x64_sys_*()")
> > >>>
> > >>> However, there is a problem with the sys_connect kprobe event that does
> > >>> not work properly. For __sys_connect event, parameters can be fetched
> > >>> normally, but for __x64_sys_connect, parameters cannot be fetched.
> > >>>
> > >>> Because of this problem, this commit fixes the sys_connect event by
> > >>> specifying the __sys_connect directly and this will bypass the
> > >>> "__x64_" appending rule of bpf_load.
> > >>
> > >> In the kernel code, we have
> > >>
> > >> SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
> > >>                   int, addrlen)
> > >> {
> > >>           return __sys_connect(fd, uservaddr, addrlen);
> > >> }
> > >>
> > >> Depending on compiler, there is no guarantee that __sys_connect will
> > >> not be inlined. I would prefer to still use the entry point
> > >> __x64_sys_* e.g.,
> > >>      SEC("kprobe/" SYSCALL(sys_write))
> > >>
> > >
> > > As you mentioned, there is clearly a possibility that problems may arise
> > > because the symbol does not exist according to the compiler.
> > >
> > > However, in x64, when using Kprobe for __x64_sys_connect event, the
> > > tests are not working properly because the parameters cannot be fetched,
> > > and the test under selftests/bpf is using "kprobe/_sys_connect" directly.
> >
> > This is the assembly code for __x64_sys_connect.
> >
> > ffffffff818d3520 <__x64_sys_connect>:
> > ffffffff818d3520: e8 fb df 32 00        callq   0xffffffff81c01520
> > <__fentry__>
> > ffffffff818d3525: 48 8b 57 60           movq    96(%rdi), %rdx
> > ffffffff818d3529: 48 8b 77 68           movq    104(%rdi), %rsi
> > ffffffff818d352d: 48 8b 7f 70           movq    112(%rdi), %rdi
> > ffffffff818d3531: e8 1a ff ff ff        callq   0xffffffff818d3450
> > <__sys_connect>
> > ffffffff818d3536: 48 98                 cltq
> > ffffffff818d3538: c3                    retq
> > ffffffff818d3539: 0f 1f 80 00 00 00 00  nopl    (%rax)
> >
> > In bpf program, the step is:
> >        struct pt_regs *real_regs = PT_REGS_PARM1(pt_regs);
> >        param1 = PT_REGS_PARM1(real_regs);
> >        param2 = PT_REGS_PARM2(real_regs);
> >        param3 = PT_REGS_PARM3(real_regs);
> > The same for s390.
> >
>
> I'm sorry that I seem to get it wrong,
> But is it available to access 'struct pt_regs *' recursively?
>
> It seems nested use of PT_REGS_PARM causes invalid memory access.
>
>     $ sudo ./test_probe_write_user
>     libbpf: load bpf program failed: Permission denied
>     libbpf: -- BEGIN DUMP LOG ---
>     libbpf:
>     Unrecognized arg#0 type PTR
>     ; struct pt_regs *real_regs = PT_REGS_PARM1(ctx);
>     0: (79) r1 = *(u64 *)(r1 +112)
>     ; void *sockaddr_arg = (void *)PT_REGS_PARM2(real_regs);
>     1: (79) r6 = *(u64 *)(r1 +104)
>     R1 invalid mem access 'inv'
>     processed 2 insns (limit 1000000) max_states_per_insn 0
> total_states 0 peak_states 0 mark_read 0
>
>     libbpf: -- END LOG --
>     libbpf: failed to load program 'kprobe/__x64_sys_connect'
>     libbpf: failed to load object './test_probe_write_user_kern.o'
>     ERROR: loading BPF object file failed
>
> I'm not fully aware of the BPF verifier's internal structure.
> Is there any workaround to solve this problem?

You need to use bpf_probe_read_kernel() to get those arguments from
real_args. Or better just use PT_REGS_PARM1_CORE(x) and others, which
does that for you (+ CO-RE relocation).


>
> Thanks for your time and effort for the review.
> Daniel.
>
> >
> > For other architectures, no above indirection is needed.
> >
> > I guess you can abstract the above into trace_common.h?
> >
> > >
> > > I'm not sure how to deal with this problem. Any advice and suggestions
> > > will be greatly appreciated.
> > >
> > > Thanks for your time and effort for the review.
> > > Daniel
> > >
> > >>>
> > >>> Fixes: 34745aed515c ("samples/bpf: fix kprobe attachment issue on x64")
> > >>> Signed-off-by: Daniel T. Lee <danieltimlee@...il.com>
> > >>> ---
> > >>>    samples/bpf/map_perf_test_kern.c         | 2 +-
> > >>>    samples/bpf/test_map_in_map_kern.c       | 2 +-
> > >>>    samples/bpf/test_probe_write_user_kern.c | 2 +-
> > >>>    3 files changed, 3 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/samples/bpf/map_perf_test_kern.c b/samples/bpf/map_perf_test_kern.c
> > >>> index 12e91ae64d4d..cebe2098bb24 100644
> > >>> --- a/samples/bpf/map_perf_test_kern.c
> > >>> +++ b/samples/bpf/map_perf_test_kern.c
> > >>> @@ -154,7 +154,7 @@ int stress_percpu_hmap_alloc(struct pt_regs *ctx)
> > >>>        return 0;
> > >>>    }
> > >>>
> > >>> -SEC("kprobe/sys_connect")
> > >>> +SEC("kprobe/__sys_connect")
> > >>>    int stress_lru_hmap_alloc(struct pt_regs *ctx)
> > >>>    {
> > >>>        char fmt[] = "Failed at stress_lru_hmap_alloc. ret:%dn";
> > >>> diff --git a/samples/bpf/test_map_in_map_kern.c b/samples/bpf/test_map_in_map_kern.c
> > >>> index 6cee61e8ce9b..b1562ba2f025 100644
> > >>> --- a/samples/bpf/test_map_in_map_kern.c
> > >>> +++ b/samples/bpf/test_map_in_map_kern.c
> > >>> @@ -102,7 +102,7 @@ static __always_inline int do_inline_hash_lookup(void *inner_map, u32 port)
> > >>>        return result ? *result : -ENOENT;
> > >>>    }
> > >>>
> > >>> -SEC("kprobe/sys_connect")
> > >>> +SEC("kprobe/__sys_connect")
> > >>>    int trace_sys_connect(struct pt_regs *ctx)
> > >>>    {
> > >>>        struct sockaddr_in6 *in6;
> > >>> diff --git a/samples/bpf/test_probe_write_user_kern.c b/samples/bpf/test_probe_write_user_kern.c
> > >>> index 6579639a83b2..9b3c3918c37d 100644
> > >>> --- a/samples/bpf/test_probe_write_user_kern.c
> > >>> +++ b/samples/bpf/test_probe_write_user_kern.c
> > >>> @@ -26,7 +26,7 @@ struct {
> > >>>     * This example sits on a syscall, and the syscall ABI is relatively stable
> > >>>     * of course, across platforms, and over time, the ABI may change.
> > >>>     */
> > >>> -SEC("kprobe/sys_connect")
> > >>> +SEC("kprobe/__sys_connect")
> > >>>    int bpf_prog1(struct pt_regs *ctx)
> > >>>    {
> > >>>        struct sockaddr_in new_addr, orig_addr = {};
> > >>>