netdev - Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160112054928.GA31180@ast-mbp.thefacebook.com>
Date:	Mon, 11 Jan 2016 21:49:29 -0800
From:	Alexei Starovoitov <alexei.starovoitov@...il.com>
To:	Ming Lei <tom.leiming@...il.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Alexei Starovoitov <ast@...nel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Network Development <netdev@...r.kernel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Martin KaFai Lau <kafai@...com>
Subject: Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update
 elem

On Tue, Jan 12, 2016 at 01:00:00PM +0800, Ming Lei wrote:
> Hi Alexei,
> 
> Thanks for your review.
> 
> On Tue, Jan 12, 2016 at 3:02 AM, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> > On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
> >> Prepare for supporting percpu map in the following patch.
> >>
> >> Now userspace can lookup/update mapped value in one specific
> >> CPU in case of percpu map.
> >>
> >> Signed-off-by: Ming Lei <tom.leiming@...il.com>
> > ...
> >> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
> >>               goto free_key;
> >>
> >>       rcu_read_lock();
> >> -     ptr = map->ops->map_lookup_elem(map, key);
> >> +     if (!percpu)
> >> +             ptr = map->ops->map_lookup_elem(map, key);
> >> +     else
> >> +             ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
> >
> > I think this approach is less potent than Martin's for several reasons:
> > - bpf program shouldn't be supplying bpf_smp_processor_id(), since
> >   it's error prone and a bit slower than doing it explicitly as in:
> >   http://patchwork.ozlabs.org/patch/564482/
> >   although Martin's patch also needs to use this_cpu_ptr() instead
> >   of per_cpu_ptr(.., smp_processor_id());
> 
> For PERCPU map, smp_processor_id() is definitely required, and
> Martin's patch need that too, please see htab_percpu_map_lookup_elem()
> in his patch.

hmm. it's definitely _not_ required. right?
bpf programs shouldn't be accessing other per-cpu regions
only their own. That's what this_cpu_ptr is for.
I don't see a case where accessing other cpu per-cpu element
wouldn't be a bug in the program.

> > - two new bpf helpers are not necessary in Martin's approach.
> >   regular map_lookup_elem() will work for both per-cpu maps.
> 
> For percpu ARRAY, they are not necessary, but it is flexiable to
> provide them since we should allow prog to retrieve the perpcu
> value, also it is easier to implement the system call with the two
> helpers.
> 
> For percpu HASH, they are required since eBPF prog need to support
> deleting element, so we have provide these helpers for prog to retrieve
> percpu value before deleting the elem.

bpf programs cannot have loops, so there is no valid case to access
other cpu element, since program cannot aggregate all-cpu values.
Therefore the programs can only update/lookup this_cpu element and
delete such element across all cpus.

> > - such map_lookup_elem_percpu() from syscall is not accurate.
> >   Martin's approach via smp_call_function_single() returns precise value,
> 
> I don't understand why Martin's approach is precise and my patch isn't,
> could you explain it a bit?

because simple mempcy() called from syscall will race with lookup/increment
done to this_cpu element on another cpu. To avoid this race the smp_call
is needed, so that memcpy() happens on the cpu that updated the element,
so smp_call's memcpy and bpf program won't be touch that cpu value
at the same time and user space will read the correct element values.
If program updates them a lot, the value that user space reads will become
stale very quickly, but it will be valid. That's especially important
when program have multiple counters inside single element value.

> >   whereas here memcpy() will race with other cpus.
> >
> > Overall I think both pre-cpu hash and per-cpu array maps are quite useful.
> 
> percpu hash isn't a must since we can get similar effect by making real_key
> and cpu_id as key with less memory consumption, but we can introduce that.

I don't think so. bpf programs shouldn't be dealing with smp_processor_id()
It was poor man's per-cpu hack and it had too many disadvantages.
Like get_next_key() doesn't work properly when key is {key+processor_id},
so walking over hash map to aggregate fake per-cpu elements requires
user space to create another map just for walking.
map->max_entries limit becomes bogus.
this_cpu_ptr(..) is typically faster than per_cpu_ptr(.., smp_proc_id())