[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200318222953.GA2507308@mini-arch.hsd1.ca.comcast.net>
Date: Wed, 18 Mar 2020 15:29:53 -0700
From: Stanislav Fomichev <sdf@...ichev.me>
To: Song Liu <songliubraving@...com>
Cc: Daniel Borkmann <daniel@...earbox.net>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
Networking <netdev@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
Kernel Team <Kernel-team@...com>,
"ast@...nel.org" <ast@...nel.org>,
"mcgrof@...nel.org" <mcgrof@...nel.org>,
"keescook@...omium.org" <keescook@...omium.org>,
"yzaikin@...gle.com" <yzaikin@...gle.com>
Subject: Re: [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with
/dev/bpf_stats
On 03/18, Song Liu wrote:
>
>
> > On Mar 18, 2020, at 1:58 PM, Daniel Borkmann <daniel@...earbox.net> wrote:
> >
> > On 3/18/20 7:33 AM, Song Liu wrote:
> >>> On Mar 17, 2020, at 4:08 PM, Song Liu <songliubraving@...com> wrote:
> >>>> On Mar 17, 2020, at 2:47 PM, Daniel Borkmann <daniel@...earbox.net> wrote:
> >>>>>>
> >>>>>> Hm, true as well. Wouldn't long-term extending "bpftool prog profile" fentry/fexit
> >>>>>> programs supersede this old bpf_stats infrastructure? Iow, can't we implement the
> >>>>>> same (or even more elaborate stats aggregation) in BPF via fentry/fexit and then
> >>>>>> potentially deprecate bpf_stats counters?
> >>>>> I think run_time_ns has its own value as a simple monitoring framework. We can
> >>>>> use it in tools like top (and variations). It will be easier for these tools to
> >>>>> adopt run_time_ns than using fentry/fexit.
> >>>>
> >>>> Agree that this is easier; I presume there is no such official integration today
> >>>> in tools like top, right, or is there anything planned?
> >>>
> >>> Yes, we do want more supports in different tools to increase the visibility.
> >>> Here is the effort for atop: https://github.com/Atoptool/atop/pull/88 .
> >>>
> >>> I wasn't pushing push hard on this one mostly because the sysctl interface requires
> >>> a user space "owner".
> >>>
> >>>>> On the other hand, in long term, we may include a few fentry/fexit based programs
> >>>>> in the kernel binary (or the rpm), so that these tools can use them easily. At
> >>>>> that time, we can fully deprecate run_time_ns. Maybe this is not too far away?
> >>>>
> >>>> Did you check how feasible it is to have something like `bpftool prog profile top`
> >>>> which then enables fentry/fexit for /all/ existing BPF programs in the system? It
> >>>> could then sort the sample interval by run_cnt, cycles, cache misses, aggregated
> >>>> runtime, etc in a top-like output. Wdyt?
> >>>
> >>> I wonder whether we can achieve this with one bpf prog (or a trampoline) that covers
> >>> all BPF programs, like a trampoline inside __BPF_PROG_RUN()?
> >>>
> >>> For long term direction, I think we could compare two different approaches: add new
> >>> tools (like bpftool prog profile top) vs. add BPF support to existing tools. The
> >>> first approach is easier. The latter approach would show BPF information to users
> >>> who are not expecting BPF programs in the systems. For many sysadmins, seeing BPF
> >>> programs in top/ps, and controlling them via kill is more natural than learning
> >>> bpftool. What's your thought on this?
> >> More thoughts on this.
> >> If we have a special trampoline that attach to all BPF programs at once, we really
> >> don't need the run_time_ns stats anymore. Eventually, tools that monitor BPF
> >> programs will depend on libbpf, so using fentry/fexit to monitor BPF programs doesn't
> >> introduce extra dependency. I guess we also need a way to include BPF program in
> >> libbpf.
> >> To summarize this plan, we need:
> >> 1) A global trampoline that attaches to all BPF programs at once;
> >
> > Overall sounds good, I think the `at once` part might be tricky, at least it would
> > need to patch one prog after another, each prog also needs to store its own metrics
> > somewhere for later collection. The start-to-sample could be a shared global var (aka
> > shared map between all the programs) which would flip the switch though.
>
> I was thinking about adding bpf_global_trampoline and use it in __BPF_PROG_RUN.
> Something like:
>
> diff --git i/include/linux/filter.h w/include/linux/filter.h
> index 9b5aa5c483cc..ac9497d1fa7b 100644
> --- i/include/linux/filter.h
> +++ w/include/linux/filter.h
> @@ -559,9 +559,14 @@ struct sk_filter {
>
> DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
>
> +extern struct bpf_trampoline *bpf_global_trampoline;
> +DECLARE_STATIC_KEY_FALSE(bpf_global_tr_active);
> +
> #define __BPF_PROG_RUN(prog, ctx, dfunc) ({ \
> u32 ret; \
> cant_migrate(); \
> + if (static_branch_unlikely(&bpf_global_tr_active)) \
> + run_the_trampoline(); \
> if (static_branch_unlikely(&bpf_stats_enabled_key)) { \
> struct bpf_prog_stats *stats; \
> u64 start = sched_clock(); \
>
>
> I am not 100% sure this is OK.
>
> I am also not sure whether this is an overkill. Do we really want more complex
> metric for all BPF programs? Or run_time_ns is enough?
I was thinking about exporting a real distribution of the prog runtimes
instead of doing an average. It would be interesting to see
50%/95%/99%/max stats.
Powered by blists - more mailing lists