netdev - Re: [PATCH v2 net-next 3/4] bpf: bpf_htab: Add syscall to iterate percpu value of a key

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160114012433.GB43324@ast-mbp.thefacebook.com>
Date:	Wed, 13 Jan 2016 17:24:34 -0800
From:	Alexei Starovoitov <alexei.starovoitov@...il.com>
To:	Ming Lei <tom.leiming@...il.com>
Cc:	Martin KaFai Lau <kafai@...com>,
	Network Development <netdev@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	FB Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v2 net-next 3/4] bpf: bpf_htab: Add syscall to iterate
 percpu value of a key

On Wed, Jan 13, 2016 at 11:43:50PM +0800, Ming Lei wrote:
> On Wed, Jan 13, 2016 at 1:23 PM, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> > On Wed, Jan 13, 2016 at 10:42:49AM +0800, Ming Lei wrote:
> >>
> >> So I don't think it is good to retrieve value from one CPU via one
> >> single system call, and accumulate them finally in userspace.
> >>
> >> One approach I thought of is to define the function(or sort of)
> >>
> >>  handle_cpu_value(u32 cpu, void *val_cpu, void *val_total)
> >>
> >> in bpf kernel code for collecting value from each cpu and
> >> accumulating them into 'val_total', and most of situations, the
> >> function can be implemented without loop most of situations.
> >> kernel can call this function directly, and the total value can be
> >> return to userspace by one single syscall.
> >>
> >> Alexei and anyone, could you comment on this draft idea for
> >> perpcu map?
> >
> > I'm not sure how you expect user space to specify such callback.
> > Kernel cannot execute user code.
> 
> I mean the above callback function can be built into bpf code and then
> run from kernel after loading like in packet filter case by tcpdump, maybe
> one new prog type is needed. It is doable in theroy. I need to investigate
> a bit to understand how it can be called from kernel, and it might be OK
> to call it via kprobe, but not elegent just for accumulating value from each
> CPU.

that would be a total overkill.

> > Also syscall/malloc/etc is a noise comparing to ipi and it
> > will still be there, so
> > for(all cpus) { syscall+ipi;} will have the same speed.
> 
> In the syscall path, lots of slow things, and finally the accumulated
> value is often stale and may not reprensent accurate number at any
> time, and can be thought as invalid.

no. stale != invalid.
Some analytics/monitor applications are good with ball park numbers
and for them regular hash map with non-atomic increment is good enough,
but others need accurate numbers. Even though they may be seconds stale.