[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZhsQeDn8biUnt9WXt6RVcW_PPX76YFyZo6CjEXGKTdDg@mail.gmail.com>
Date: Mon, 29 Jul 2024 14:01:20 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Leon Hwang <hffilwlqm@...il.com>
Cc: Yonghong Song <yonghong.song@...ux.dev>, Zheao Li <me@...jusaka.me>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v2] bpf: Add bpf_check_attach_target_with_klog
method to output failure logs to kernel
On Fri, Jul 26, 2024 at 9:04 PM Leon Hwang <hffilwlqm@...il.com> wrote:
>
>
>
> On 2024/7/27 08:12, Andrii Nakryiko wrote:
> > On Thu, Jul 25, 2024 at 7:57 PM Leon Hwang <hffilwlqm@...il.com> wrote:
> >>
> >>
> >>
> >> On 26/7/24 05:27, Andrii Nakryiko wrote:
> >>> On Thu, Jul 25, 2024 at 12:33 AM Leon Hwang <hffilwlqm@...il.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 25/7/24 14:09, Yonghong Song wrote:
> >>>>>
> >>>>> On 7/24/24 11:05 PM, Leon Hwang wrote:
> >>>>>>
> >>>>>> On 25/7/24 13:54, Yonghong Song wrote:
> >>>>>>> On 7/24/24 10:15 PM, Zheao Li wrote:
> >>>>>>>> This is a v2 patch, previous Link:
> >>>>>>>> https://lore.kernel.org/bpf/20240724152521.20546-1-me@manjusaka.me/T/#u
> >>>>>>>>
> >>
> >> [SNI]
> >>
> >
> > [...]
> >
> >>>
> >>
> >> Build and run, sudo ./retsnoop -e verbose -e bpf_log -e
> >> bpf_verifier_vlog -e bpf_verifier_log_write -STA -v, here's the output:
> >>
> >>
> >> FUNCTION CALLS RESULT DURATION ARGS
> >> -------------- ------ -------- ----
> >> ↔ bpf_log [void] 1.350us log=NULL fmt='%s() is not a global
> >> function ' =(vararg)
> >>
> >> It's great to show arguments.
> >>
> >
> > Thanks for repro steps, they worked. Also, I just pushed latest
> > retsnoop version to Github that does support capturing vararg
> > arguments for printf-like functions. See full debugging log at [0],
> > but I basically did just two things:
> >
> > $ sudo retsnoop -e '*sys_bpf' --lbr -n freplace
> >
> > -n freplace filters by process name, to avoid the noise. I traced
> > bpf() syscall (*sys_bf), and I requested function call LBR (Last
> > Branch Record) stack. LBR showed that we have
> > bpf_prog_attach_check_attach_type() call, and then eventually we get
> > to bpf_log().
> >
> > So I then traced bpf_log (no --lbr this time, but I requested function
> > trace + arguments capture:
> >
> > $ sudo retsnoop -n freplace -e '*sys_bpf' -a bpf_log -TA
> >
> > 17:02:39.968302 -> 17:02:39.968307 TID/PID 2730863/2730855 (freplace/freplace):
> >
> > FUNCTION CALLS RESULT DURATION ARGS
> > ----------------- --------- -------- ----
> > → __x64_sys_bpf
> > regs=&{.r15=2,.r14=0xc0000061c0,.bp=0xc00169f8a8,.bx=28,.r11=514,.ax=0xffffffffffffffda,.cx=0x404f4e,.dx=64,.si=0xc00169fa10…
> > → __sys_bpf cmd=28
> > uattr={{.kernel=0xc00169fa10,.user=0xc00169fa10}} size=64
> > ↔ bpf_log [void] 1.550us log=NULL fmt='%s() is not a
> > global function ' vararg0='stub_handler_static'
> > ← __sys_bpf [-EINVAL] 4.115us
> > ← __x64_sys_bpf [-EINVAL] 5.467us
> >
> >
> > For __x64_sys_bpf that's struct pt_regs, which isn't that interesting,
> > but then we have:
> >
> > ↔ bpf_log [void] 1.550us log=NULL fmt='%s() is not a global
> > function ' vararg0='stub_handler_static'
>
> It's awesome to show vararg.
>
> >
> > Which showed format string and the argument passed to it:
> > 'stub_hanler_static' subprogram seems to be the problem here.
> >
> >
> > Anyways, tbh, for a problem like this, it's probably best to just
> > request a verbose log when doing the BPF_PROG_LOAD command. You can
> > *normally* use veristat tool to get that easily, if you have a .bpf.o
> > object file on the disk. But in this case it's freplace and veristat
> > doesn't know what's the target BPF program, so it's not that useful in
> > this case:
> >
> > $ sudo veristat -v freplace_bpfel.o
> > Processing 'freplace_bpfel.o'...
> > libbpf: prog 'freplace_handler': attach program FD is not set
> > libbpf: prog 'freplace_handler': failed to prepare load attributes: -22
> > libbpf: prog 'freplace_handler': failed to load: -22
> > libbpf: failed to load object 'freplace_bpfel.o'
> > PROCESSING freplace_bpfel.o/freplace_handler, DURATION US: 0, VERDICT:
> > failure, VERIFIER LOG:
> >
> > File Program Verdict Duration (us) Insns
> > States Peak states
> > ---------------- ---------------- ------- ------------- -----
> > ------ -----------
> > freplace_bpfel.o freplace_handler failure 0 0
> > 0 0
> > ---------------- ---------------- ------- ------------- -----
> > ------ -----------
> > Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs.
> >
> > But for lots of other programs this would be a no-brainer.
> >
> >
> > [0] https://gist.github.com/anakryiko/88a1597a68e43dc945e40fde88a96e7e
> >
> > [...]
> >
> >>
> >> Is it OK to add a tracepoint here? I think tracepoint is more generic
> >> than retsnoop-like way.
> >
> > I personally don't see a problem with adding tracepoint, but how would
> > it look like, given we are talking about vararg printf-style function
> > calls? I'm not sure how that should be represented in such a way as to
> > make it compatible with tracepoints and not cause any runtime
> > overhead.
>
> The tracepoint is not about vararg printf-style function calls.
>
> It is to trace the reason why it fails to bpf_check_attach_target() at
> attach time.
>
Oh, that changes things. I don't think we can keep adding extra
tracepoints for various potential reasons that BPF prog might be
failing to verify.
But there is usually no need either. This particular code already
supports emitting extra information into verifier log, you just have
to provide that. This is done by libbpf automatically, can't your
library of choice do the same (if BPF program failed).
Why go to all this trouble if we already have a facility to debug
issues like this. Note every issue is logged into verifier log, but in
this case it is.
> So, let me introduce bpf_check_attach_target_with_tracepoint() and
> BPF_LOG_KERNEL_WITHOUT_PRINT. bpf_check_attach_target_with_tracepoint()
> is to call bpf_check_attach_target() and then to call
> trace_bpf_check_attach_target() if err. BPF_LOG_KERNEL_WITHOUT_PRINT is
> to avoid pr_err() in bpf_verifier_vlog().
>
> Here's the diff without trace_bpf_check_attach_target() definition:
[...]
Powered by blists - more mailing lists