netdev - Re: [PATCH bpf-next 1/4] bpf: enable program stats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190223023136.ngg334d2k6aw4qzr@ast-mbp.dhcp.thefacebook.com>
Date:   Fri, 22 Feb 2019 18:31:37 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Roman Gushchin <guro@...com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/4] bpf: enable program stats

On Sat, Feb 23, 2019 at 12:34:38AM +0000, Roman Gushchin wrote:
> On Fri, Feb 22, 2019 at 03:36:41PM -0800, Alexei Starovoitov wrote:
> > JITed BPF programs are indistinguishable from kernel functions, but unlike
> > kernel code BPF code can be changed often.
> > Typical approach of "perf record" + "perf report" profiling and tunning of
> > kernel code works just as well for BPF programs, but kernel code doesn't
> > need to be monitored whereas BPF programs do.
> > Users load and run large amount of BPF programs.
> > These BPF stats allow tools monitor the usage of BPF on the server.
> > The monitoring tools will turn sysctl kernel.bpf_stats_enabled
> > on and off for few seconds to sample average cost of the programs.
> > Aggregated data over hours and days will provide an insight into cost of BPF
> > and alarms can trigger in case given program suddenly gets more expensive.
> > 
> > The cost of two sched_clock() per program invocation adds ~20 nsec.
> > Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
> > from ~40 nsec to ~60 nsec.
> > static_key minimizes the cost of the stats collection.
> > There is no measurable difference before/after this patch
> > with kernel.bpf_stats_enabled=0
> > 
> > Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> > ---
> >  include/linux/bpf.h    |  7 +++++++
> >  include/linux/filter.h | 16 +++++++++++++++-
> >  kernel/bpf/core.c      | 13 +++++++++++++
> >  kernel/bpf/syscall.c   | 24 ++++++++++++++++++++++--
> >  kernel/bpf/verifier.c  |  5 +++++
> >  kernel/sysctl.c        | 34 ++++++++++++++++++++++++++++++++++
> >  6 files changed, 96 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index de18227b3d95..14260674bc57 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -340,6 +340,11 @@ enum bpf_cgroup_storage_type {
> >  
> >  #define MAX_BPF_CGROUP_STORAGE_TYPE __BPF_CGROUP_STORAGE_MAX
> >  
> > +struct bpf_prog_stats {
> > +	u64 cnt;
> > +	u64 nsecs;
> > +};
> > +
> >  struct bpf_prog_aux {
> >  	atomic_t refcnt;
> >  	u32 used_map_cnt;
> > @@ -389,6 +394,7 @@ struct bpf_prog_aux {
> >  	 * main prog always has linfo_idx == 0
> >  	 */
> >  	u32 linfo_idx;
> > +	struct bpf_prog_stats __percpu *stats;
> >  	union {
> >  		struct work_struct work;
> >  		struct rcu_head	rcu;
> > @@ -559,6 +565,7 @@ void bpf_map_area_free(void *base);
> >  void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> >  
> >  extern int sysctl_unprivileged_bpf_disabled;
> > +extern int sysctl_bpf_stats_enabled;
> >  
> >  int bpf_map_new_fd(struct bpf_map *map, int flags);
> >  int bpf_prog_new_fd(struct bpf_prog *prog);
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index f32b3eca5a04..7b14235b6f7d 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -533,7 +533,21 @@ struct sk_filter {
> >  	struct bpf_prog	*prog;
> >  };
> >  
> > -#define BPF_PROG_RUN(filter, ctx)  ({ cant_sleep(); (*(filter)->bpf_func)(ctx, (filter)->insnsi); })
> > +DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
> > +
> > +#define BPF_PROG_RUN(prog, ctx)	({				\
> > +	u32 ret;						\
> > +	cant_sleep();						\
> > +	if (static_branch_unlikely(&bpf_stats_enabled_key)) {	\
> > +		u64 start = sched_clock();			\
> > +		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
> > +		this_cpu_inc(prog->aux->stats->cnt);		\
> > +		this_cpu_add(prog->aux->stats->nsecs,		\
> > +			     sched_clock() - start);		\
> > +	} else {						\
> > +		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
> > +	}							\
> > +	ret; })
> 
> Can we bump "cnt" unconditionally and gate only the "nsecs" calculation?

xdp cannot afford to pay this penalty unconditionally.
It's not only this_cpu_inc. It's prog->aux->stats loads.