netdev - Re: [PATCH bpf-next 2/2] bpf, libbpf: simplify perf RB walk and do incremental updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4258c8c5-f18e-65ba-cb14-4a2f2e3187cc@iogearbox.net>
Date:   Fri, 12 Oct 2018 10:39:07 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     alexei.starovoitov@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 2/2] bpf, libbpf: simplify perf RB walk and do
 incremental updates

On 10/12/2018 05:04 AM, Jakub Kicinski wrote:
> On Thu, 11 Oct 2018 16:02:07 +0200, Daniel Borkmann wrote:
>> Clean up and improve bpf_perf_event_read_simple() ring walk a bit
>> to use similar tail update scheme as in perf and bcc allowing the
>> kernel to make forward progress not only after full timely walk.
> 
> The extra memory barriers won't impact performance?  If I read the code
> correctly we now have:
> 
> 	while (bla) {
> 		head = HEAD
> 		rmb()
> 
> 		...
> 
> 		mb()
> 		TAIL = tail
> 	}
> 
> Would it make sense to try to piggy back on the mb() for head re-read
> at least?  Perhaps that's a non-issue, just wondering.

>From the scheme specified in the comment in prior patch my understanding
would be that they don't pair (see B and C) so there would be no guarantee
that load of head would happen before load of data. Fwiw, I've been using
the exact same semantics as user space perf tool walks the perf mmap'ed
ring buffer (tools/perf/util/mmap.{h,c}) here. Given kernel doesn't stop
pushing into ring buffer while user space walks it and indicates how far
it has consumed data via tail update, it would allow for making room
successively and not only after full run has complete, so we don't make
any assumptions in the generic libbpf library helper on how slow/quick
the callback would be processing resp. how full ring is, etc, and kernel
pushing new data can be processed in the same run if necessary. One thing
we could consider is to batch tail updates, say, every 8 elements and a
final update once we break out walking the ring; probably okay'ish as a
heuristic..

>> Also few other improvements to use realloc() instead of free() and
>> malloc() combination and for the callback use proper perf_event_header
>> instead of void pointer, so that real applications can use container_of()
>> macro with proper type checking.
> 
> FWIW the free() + malloc() was to avoid the the needless copy of the
> previous event realloc() may do.  It makes sense to use realloc()
> especially if you want to put extra info in front of the buffer, just
> sayin' it wasn't a complete braino ;)

No strong preference from my side, I'd think that it might be sensible in
any case from applications to call the bpf_perf_event_read_simple() with a
already preallocated buffer, depending on the expected max element size from
BPF could e.g. be a buffer of 1 page or so. Given 512 byte stack space from
the BPF prog and MTU 1500 this would more than suffice to avoid new
allocations altogether. Anyway, given we only grow the new memory area I
did some testing on realloc() with an array of pointers to prior malloc()'ed
buffers, running randomly 10M realloc()s to increase size over them and
saw <1% where area had to be moved, so we're hitting corner case of a corner
case, I'm also ok to leave the combination, though. :)

Thanks,
Daniel