[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190223023136.ngg334d2k6aw4qzr@ast-mbp.dhcp.thefacebook.com>
Date: Fri, 22 Feb 2019 18:31:37 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Roman Gushchin <guro@...com>
Cc: Alexei Starovoitov <ast@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/4] bpf: enable program stats
On Sat, Feb 23, 2019 at 12:34:38AM +0000, Roman Gushchin wrote:
> On Fri, Feb 22, 2019 at 03:36:41PM -0800, Alexei Starovoitov wrote:
> > JITed BPF programs are indistinguishable from kernel functions, but unlike
> > kernel code BPF code can be changed often.
> > Typical approach of "perf record" + "perf report" profiling and tunning of
> > kernel code works just as well for BPF programs, but kernel code doesn't
> > need to be monitored whereas BPF programs do.
> > Users load and run large amount of BPF programs.
> > These BPF stats allow tools monitor the usage of BPF on the server.
> > The monitoring tools will turn sysctl kernel.bpf_stats_enabled
> > on and off for few seconds to sample average cost of the programs.
> > Aggregated data over hours and days will provide an insight into cost of BPF
> > and alarms can trigger in case given program suddenly gets more expensive.
> >
> > The cost of two sched_clock() per program invocation adds ~20 nsec.
> > Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
> > from ~40 nsec to ~60 nsec.
> > static_key minimizes the cost of the stats collection.
> > There is no measurable difference before/after this patch
> > with kernel.bpf_stats_enabled=0
> >
> > Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> > ---
> > include/linux/bpf.h | 7 +++++++
> > include/linux/filter.h | 16 +++++++++++++++-
> > kernel/bpf/core.c | 13 +++++++++++++
> > kernel/bpf/syscall.c | 24 ++++++++++++++++++++++--
> > kernel/bpf/verifier.c | 5 +++++
> > kernel/sysctl.c | 34 ++++++++++++++++++++++++++++++++++
> > 6 files changed, 96 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index de18227b3d95..14260674bc57 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -340,6 +340,11 @@ enum bpf_cgroup_storage_type {
> >
> > #define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX
> >
> > +struct bpf_prog_stats {
> > + u64 cnt;
> > + u64 nsecs;
> > +};
> > +
> > struct bpf_prog_aux {
> > atomic_t refcnt;
> > u32 used_map_cnt;
> > @@ -389,6 +394,7 @@ struct bpf_prog_aux {
> > * main prog always has linfo_idx == 0
> > */
> > u32 linfo_idx;
> > + struct bpf_prog_stats __percpu *stats;
> > union {
> > struct work_struct work;
> > struct rcu_head rcu;
> > @@ -559,6 +565,7 @@ void bpf_map_area_free(void *base);
> > void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> >
> > extern int sysctl_unprivileged_bpf_disabled;
> > +extern int sysctl_bpf_stats_enabled;
> >
> > int bpf_map_new_fd(struct bpf_map *map, int flags);
> > int bpf_prog_new_fd(struct bpf_prog *prog);
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index f32b3eca5a04..7b14235b6f7d 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -533,7 +533,21 @@ struct sk_filter {
> > struct bpf_prog *prog;
> > };
> >
> > -#define BPF_PROG_RUN(filter, ctx) ({ cant_sleep(); (*(filter)->bpf_func)(ctx, (filter)->insnsi); })
> > +DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> > +
> > +#define BPF_PROG_RUN(prog, ctx) ({ \
> > + u32 ret; \
> > + cant_sleep(); \
> > + if (static_branch_unlikely(&bpf_stats_enabled_key)) { \
> > + u64 start = sched_clock(); \
> > + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
> > + this_cpu_inc(prog->aux->stats->cnt); \
> > + this_cpu_add(prog->aux->stats->nsecs, \
> > + sched_clock() - start); \
> > + } else { \
> > + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
> > + } \
> > + ret; })
>
> Can we bump "cnt" unconditionally and gate only the "nsecs" calculation?
xdp cannot afford to pay this penalty unconditionally.
It's not only this_cpu_inc. It's prog->aux->stats loads.
Powered by blists - more mailing lists