[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <531acdb6-aef3-fe19-64aa-4255daab9cc1@iogearbox.net>
Date: Fri, 12 Oct 2018 15:30:57 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: alexei.starovoitov@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 2/2] bpf, libbpf: simplify perf RB walk and do
incremental updates
On 10/12/2018 10:39 AM, Daniel Borkmann wrote:
> On 10/12/2018 05:04 AM, Jakub Kicinski wrote:
>> On Thu, 11 Oct 2018 16:02:07 +0200, Daniel Borkmann wrote:
>>> Clean up and improve bpf_perf_event_read_simple() ring walk a bit
>>> to use similar tail update scheme as in perf and bcc allowing the
>>> kernel to make forward progress not only after full timely walk.
>>
>> The extra memory barriers won't impact performance? If I read the code
>> correctly we now have:
>>
>> while (bla) {
>> head = HEAD
>> rmb()
>>
>> ...
>>
>> mb()
>> TAIL = tail
>> }
>>
>> Would it make sense to try to piggy back on the mb() for head re-read
>> at least? Perhaps that's a non-issue, just wondering.
>
> From the scheme specified in the comment in prior patch my understanding
> would be that they don't pair (see B and C) so there would be no guarantee
> that load of head would happen before load of data. Fwiw, I've been using
> the exact same semantics as user space perf tool walks the perf mmap'ed
> ring buffer (tools/perf/util/mmap.{h,c}) here. Given kernel doesn't stop
On that note, I'll also respin, after some clarification with PeterZ on
why perf is using {rmb,mb}() barriers today as opposed to more lightweight
smp_{rmb,mb}() ones it turns out there is no real reason other than
historic one and perf can be changed and made more efficient as well. ;)
> pushing into ring buffer while user space walks it and indicates how far
> it has consumed data via tail update, it would allow for making room
> successively and not only after full run has complete, so we don't make
> any assumptions in the generic libbpf library helper on how slow/quick
> the callback would be processing resp. how full ring is, etc, and kernel
> pushing new data can be processed in the same run if necessary. One thing
> we could consider is to batch tail updates, say, every 8 elements and a
> final update once we break out walking the ring; probably okay'ish as a
> heuristic..
>
>>> Also few other improvements to use realloc() instead of free() and
>>> malloc() combination and for the callback use proper perf_event_header
>>> instead of void pointer, so that real applications can use container_of()
>>> macro with proper type checking.
>>
>> FWIW the free() + malloc() was to avoid the the needless copy of the
>> previous event realloc() may do. It makes sense to use realloc()
>> especially if you want to put extra info in front of the buffer, just
>> sayin' it wasn't a complete braino ;)
>
> No strong preference from my side, I'd think that it might be sensible in
> any case from applications to call the bpf_perf_event_read_simple() with a
> already preallocated buffer, depending on the expected max element size from
> BPF could e.g. be a buffer of 1 page or so. Given 512 byte stack space from
> the BPF prog and MTU 1500 this would more than suffice to avoid new
> allocations altogether. Anyway, given we only grow the new memory area I
> did some testing on realloc() with an array of pointers to prior malloc()'ed
> buffers, running randomly 10M realloc()s to increase size over them and
> saw <1% where area had to be moved, so we're hitting corner case of a corner
> case, I'm also ok to leave the combination, though. :)
>
> Thanks,
> Daniel
Powered by blists - more mailing lists