linux-kernel - Re: [RFC bpf-next 0/3] bpf: adding map batch processing support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61f1e2f3-e0d5-cf2e-c16e-807b09bb84e7@fb.com>
Date:   Wed, 13 Nov 2019 02:34:36 +0000
From:   Yonghong Song <yhs@...com>
To:     Brian Vazquez <brianvv@...gle.com>,
        Brian Vazquez <brianvv.kernel@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        "Daniel Borkmann" <daniel@...earbox.net>,
        "David S . Miller" <davem@...emloft.net>
CC:     Stanislav Fomichev <sdf@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "Petar Penkov" <ppenkov@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [RFC bpf-next 0/3] bpf: adding map batch processing support



On 11/7/19 1:20 PM, Brian Vazquez wrote:
> This is a follow up in the effort to batch bpf map operations to reduce
> the syscall overhead with the map_ops. I initially proposed the idea and
> the discussion is here:
> https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
> 
> Yonghong talked at the LPC about this and also proposed and idea that
> handles the special weird case of hashtables by doing traversing using
> the buckets as a reference instead of a key. Discussion is here:
> https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/
> 
> This RFC proposes a way to extend batch operations for more data
> structures by creating generic batch functions that can be used instead
> of implementing the operations for each individual data structure,
> reducing the code that needs to be maintained. The series contains the
> patches used in Yonghong's RFC and the patch that adds the generic
> implementation of the operations plus some testing with pcpu hashmaps
> and arrays. Note that pcpu hashmap shouldn't use the generic
> implementation and it either should have its own implementation or share
> the one introduced by Yonghong, I added that just as an example to show
> that the generic implementation can be easily added to a data structure.
> 
> What I want to achieve with this RFC is to collect early feedback and see if
> there's any major concern about this before I move forward. I do plan
> to better separate this into different patches and explain them properly
> in the commit messages.

Thanks Brian for working on this. The general approach described here
is good to me. Having a generic implementation for batch operations
looks good for maps (not hash table, queue/stack, etc.)

> 
> Current known issues where I would like to discuss are the followings:
> 
> - Because Yonghong's UAPI definition was done specifically for
>    iterating buckets, the batch field is u64 and is treated as an u64
>    instead of an opaque pointer, this won't work for other data structures
>    that are going to use a key as a batch token with a size greater than
>    64. Although I think at this point the only key that couldn't be
>    treated as a u64 is the key of a hashmap, and the hashmap won't use
>    the generic interface.

The u64 can be changed with a __aligned_u64 opaque value. This way,
it can represent a pointer or a 64bit value.

> - Not all the data structures use delete (because it's not a valid
>    operation) i.e. arrays. So maybe lookup_and_delete_batch command is
>    not needed and we can handle that operation with a lookup_batch and a
>    flag.

This make sense.

> - For delete_batch (not just the lookup_and_delete_batch). Is this
>    operation really needed? If so, shouldn't it be better if the
>    behaviour is delete the keys provided? I did that with my generic
>    implementation but Yonghong's delete_batch for a hashmap deletes
>    buckets.

We need batched delete in bcc. lookup_and_delete_batch is better as
it can preserves more new map entries. Alternatively, deleting
all entries after lookup is another option. But this may remove
more new map entries. Statistically this may or may not matter though.

bcc does have a clear_table (clear_map) API, but not clear who is using it.

So, I did not have a concrete use case for delete_batch yet.
I tend to think we should have delete_batch for API completeness,
but maybe other people can comment on this as well.

Maybe initial patch, we can skip it. But we should still ensure
user interface data structure can handle batch delete if it is
needed later. The current data structure should handle this
as far as I know.

> 
> Brian Vazquez (1):
>    bpf: add generic batch support
> 
> Yonghong Song (2):
>    bpf: adding map batch processing support
>    tools/bpf: test bpf_map_lookup_and_delete_batch()
> 
>   include/linux/bpf.h                           |  21 +
>   include/uapi/linux/bpf.h                      |  22 +
>   kernel/bpf/arraymap.c                         |   4 +
>   kernel/bpf/hashtab.c                          | 331 ++++++++++
>   kernel/bpf/syscall.c                          | 573 ++++++++++++++----
>   tools/include/uapi/linux/bpf.h                |  22 +
>   tools/lib/bpf/bpf.c                           |  59 ++
>   tools/lib/bpf/bpf.h                           |  13 +
>   tools/lib/bpf/libbpf.map                      |   4 +
>   .../map_tests/map_lookup_and_delete_batch.c   | 245 ++++++++
>   .../map_lookup_and_delete_batch_array.c       | 118 ++++
>   11 files changed, 1292 insertions(+), 120 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch.c
>   create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
>