[<prev] [next>] [day] [month] [year] [list]
Message-ID: <45d9880e-de9e-abb6-b883-fecb9781738e@fb.com>
Date: Wed, 13 Nov 2019 22:26:01 +0000
From: Yonghong Song <yhs@...com>
To: Brian Vazquez <brianvv@...gle.com>
CC: Brian Vazquez <brianvv.kernel@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"David S . Miller" <davem@...emloft.net>,
Stanislav Fomichev <sdf@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
Petar Penkov <ppenkov@...gle.com>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [RFC bpf-next 0/3] bpf: adding map batch processing support
On 11/13/19 2:07 PM, Brian Vazquez wrote:
> Hi Yonghong,
>
> Thanks for reviewing it! I'm preparing the changes and will submit them
> sooner.
>
> As for the right way to manage author rights, does anyone know what the
> correct approach is? Should I use Yonghong's patch and apply the
> extended support in different patches (i.e. support per_cpu maps, change
> batch from u64 to __aligned_u64, etc) or it is fine to apply the changes
> in place and write both sign-offs?
The logic flow of the patch set is most important.
You can add me as co-signoff if you reuse significant portion of my code.
>
> Thanks,
> Brian
>
> On Tue, Nov 12, 2019 at 6:34 PM Yonghong Song <yhs@...com
> <mailto:yhs@...com>> wrote:
>
>
>
> On 11/7/19 1:20 PM, Brian Vazquez wrote:
> > This is a follow up in the effort to batch bpf map operations to
> reduce
> > the syscall overhead with the map_ops. I initially proposed the
> idea and
> > the discussion is here:
> >
> https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
> >
> > Yonghong talked at the LPC about this and also proposed and idea that
> > handles the special weird case of hashtables by doing traversing
> using
> > the buckets as a reference instead of a key. Discussion is here:
> > https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/
> >
> > This RFC proposes a way to extend batch operations for more data
> > structures by creating generic batch functions that can be used
> instead
> > of implementing the operations for each individual data structure,
> > reducing the code that needs to be maintained. The series
> contains the
> > patches used in Yonghong's RFC and the patch that adds the generic
> > implementation of the operations plus some testing with pcpu hashmaps
> > and arrays. Note that pcpu hashmap shouldn't use the generic
> > implementation and it either should have its own implementation
> or share
> > the one introduced by Yonghong, I added that just as an example
> to show
> > that the generic implementation can be easily added to a data
> structure.
> >
> > What I want to achieve with this RFC is to collect early feedback
> and see if
> > there's any major concern about this before I move forward. I do plan
> > to better separate this into different patches and explain them
> properly
> > in the commit messages.
>
> Thanks Brian for working on this. The general approach described here
> is good to me. Having a generic implementation for batch operations
> looks good for maps (not hash table, queue/stack, etc.)
>
> >
> > Current known issues where I would like to discuss are the
> followings:
> >
> > - Because Yonghong's UAPI definition was done specifically for
> > iterating buckets, the batch field is u64 and is treated as an u64
> > instead of an opaque pointer, this won't work for other data
> structures
> > that are going to use a key as a batch token with a size
> greater than
> > 64. Although I think at this point the only key that couldn't be
> > treated as a u64 is the key of a hashmap, and the hashmap
> won't use
> > the generic interface.
>
> The u64 can be changed with a __aligned_u64 opaque value. This way,
> it can represent a pointer or a 64bit value.
>
> > - Not all the data structures use delete (because it's not a valid
> > operation) i.e. arrays. So maybe lookup_and_delete_batch
> command is
> > not needed and we can handle that operation with a
> lookup_batch and a
> > flag.
>
> This make sense.
>
> > - For delete_batch (not just the lookup_and_delete_batch). Is this
> > operation really needed? If so, shouldn't it be better if the
> > behaviour is delete the keys provided? I did that with my generic
> > implementation but Yonghong's delete_batch for a hashmap deletes
> > buckets.
>
> We need batched delete in bcc. lookup_and_delete_batch is better as
> it can preserves more new map entries. Alternatively, deleting
> all entries after lookup is another option. But this may remove
> more new map entries. Statistically this may or may not matter though.
>
> bcc does have a clear_table (clear_map) API, but not clear who is
> using it.
>
> So, I did not have a concrete use case for delete_batch yet.
> I tend to think we should have delete_batch for API completeness,
> but maybe other people can comment on this as well.
>
> Maybe initial patch, we can skip it. But we should still ensure
> user interface data structure can handle batch delete if it is
> needed later. The current data structure should handle this
> as far as I know.
>
> >
> > Brian Vazquez (1):
> > bpf: add generic batch support
> >
> > Yonghong Song (2):
> > bpf: adding map batch processing support
> > tools/bpf: test bpf_map_lookup_and_delete_batch()
> >
> > include/linux/bpf.h | 21 +
> > include/uapi/linux/bpf.h | 22 +
> > kernel/bpf/arraymap.c | 4 +
> > kernel/bpf/hashtab.c | 331 ++++++++++
> > kernel/bpf/syscall.c | 573
> ++++++++++++++----
> > tools/include/uapi/linux/bpf.h | 22 +
> > tools/lib/bpf/bpf.c | 59 ++
> > tools/lib/bpf/bpf.h | 13 +
> > tools/lib/bpf/libbpf.map | 4 +
> > .../map_tests/map_lookup_and_delete_batch.c | 245 ++++++++
> > .../map_lookup_and_delete_batch_array.c | 118 ++++
> > 11 files changed, 1292 insertions(+), 120 deletions(-)
> > create mode 100644
> tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch.c
> > create mode 100644
> tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
> >
>
Powered by blists - more mailing lists