lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160113022204.GA25270@kafai-mba.dhcp.thefacebook.com>
Date:	Tue, 12 Jan 2016 18:22:04 -0800
From:	Martin KaFai Lau <kafai@...com>
To:	Ming Lei <tom.leiming@...il.com>
CC:	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Alexei Starovoitov <ast@...nel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Network Development <netdev@...r.kernel.org>,
	Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update
 elem

On Wed, Jan 13, 2016 at 08:38:18AM +0800, Ming Lei wrote:
> On Wed, Jan 13, 2016 at 3:10 AM, Martin KaFai Lau <kafai@...com> wrote:
> > On Tue, Jan 12, 2016 at 07:05:47PM +0800, Ming Lei wrote:
> >> On Tue, Jan 12, 2016 at 1:49 PM, Alexei Starovoitov
> >> <alexei.starovoitov@...il.com> wrote:
> >> > On Tue, Jan 12, 2016 at 01:00:00PM +0800, Ming Lei wrote:
> >> >> Hi Alexei,
> >> >>
> >> >> Thanks for your review.
> >> >>
> >> >> On Tue, Jan 12, 2016 at 3:02 AM, Alexei Starovoitov
> >> >> <alexei.starovoitov@...il.com> wrote:
> >> >> > On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
> >> >> >> Prepare for supporting percpu map in the following patch.
> >> >> >>
> >> >> >> Now userspace can lookup/update mapped value in one specific
> >> >> >> CPU in case of percpu map.
> >> >> >>
> >> >> >> Signed-off-by: Ming Lei <tom.leiming@...il.com>
> >> >> > ...
> >> >> >> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
> >> >> >>               goto free_key;
> >> >> >>
> >> >> >>       rcu_read_lock();
> >> >> >> -     ptr = map->ops->map_lookup_elem(map, key);
> >> >> >> +     if (!percpu)
> >> >> >> +             ptr = map->ops->map_lookup_elem(map, key);
> >> >> >> +     else
> >> >> >> +             ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
> >> >> >
> >> >> > I think this approach is less potent than Martin's for several reasons:
> >> >> > - bpf program shouldn't be supplying bpf_smp_processor_id(), since
> >> >> >   it's error prone and a bit slower than doing it explicitly as in:
> >> >> >   https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_patch_564482_&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=kb6DfquDoMLBv0hgOO76O9SMvdCnhwnEwhgON8868I8&s=QtJkMfQDB55jn_aA_umJ8jiJRQlQhW5UxYO5YdxuGNI&e=
> >> >> >   although Martin's patch also needs to use this_cpu_ptr() instead
> >> >> >   of per_cpu_ptr(.., smp_processor_id());
> >> >>
> >> >> For PERCPU map, smp_processor_id() is definitely required, and
> >> >> Martin's patch need that too, please see htab_percpu_map_lookup_elem()
> >> >> in his patch.
> >> >
> >> > hmm. it's definitely _not_ required. right?
> >> > bpf programs shouldn't be accessing other per-cpu regions
> >> > only their own. That's what this_cpu_ptr is for.
> >> > I don't see a case where accessing other cpu per-cpu element
> >> > wouldn't be a bug in the program.
> >> >
> >> >> > - two new bpf helpers are not necessary in Martin's approach.
> >> >> >   regular map_lookup_elem() will work for both per-cpu maps.
> >> >>
> >> >> For percpu ARRAY, they are not necessary, but it is flexiable to
> >> >> provide them since we should allow prog to retrieve the perpcu
> >> >> value, also it is easier to implement the system call with the two
> >> >> helpers.
> >> >>
> >> >> For percpu HASH, they are required since eBPF prog need to support
> >> >> deleting element, so we have provide these helpers for prog to retrieve
> >> >> percpu value before deleting the elem.
> >> >
> >> > bpf programs cannot have loops, so there is no valid case to access
> >> > other cpu element, since program cannot aggregate all-cpu values.
> >> > Therefore the programs can only update/lookup this_cpu element and
> >> > delete such element across all cpus.
> >>
> >> Looks I missed the point of looping constraint, then basically delete element
> >> helper doesn't make sense in percpu hash.
> >>
> >> >
> >> >> > - such map_lookup_elem_percpu() from syscall is not accurate.
> >> >> >   Martin's approach via smp_call_function_single() returns precise value,
> >> >>
> >> >> I don't understand why Martin's approach is precise and my patch isn't,
> >> >> could you explain it a bit?
> >> >
> >> > because simple mempcy() called from syscall will race with lookup/increment
> >> > done to this_cpu element on another cpu. To avoid this race the smp_call
> >> > is needed, so that memcpy() happens on the cpu that updated the element,
> >> > so smp_call's memcpy and bpf program won't be touch that cpu value
> >> > at the same time and user space will read the correct element values.
> >> > If program updates them a lot, the value that user space reads will become
> >> > stale very quickly, but it will be valid. That's especially important
> >> > when program have multiple counters inside single element value.
> >>
> >> But smp_call is often very slow because of IPI, so the value acculated
> >> finally becomes stale easily even though the value from the requested cpu
> >> is 'precise' at the exact time, especially when there are lots of CPUs, so I
> >> think using smp_call is really a bad idea. And smp_call is worse than
> >> iterating from CPUs simply.
> > The userspace usually only aggregates value across all cpu every X seconds.
>
> That is just in your case, and Alexei worried the issue of data stale.
I believe we are talking about validity of a value.  How to
make use of a less-stale but invalid data?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ