[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <867e95e1-cb45-54c8-ce66-d3e49161d5e6@polito.it>
Date: Thu, 9 Aug 2018 09:51:49 -0500
From: Mauricio Vasquez <mauricio.vasquez@...ito.it>
To: Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next 1/3] bpf: add bpf queue map
On 08/09/2018 04:02 AM, Daniel Borkmann wrote:
> On 08/09/2018 06:48 AM, Alexei Starovoitov wrote:
>> On Wed, Aug 08, 2018 at 10:08:47PM -0500, Mauricio Vasquez wrote:
>>>> And how about adding three new helpers: push/pop/peek as well?
>>>> Reusing lookup/update is neat, but does lookup == pop
>>>> or does lookup == peek ?
>>>> I suspect it will be confusing.
>>>> Three new helpers cost nothing, but will make bpf progs easier to read.
>>> I agree. I have one doubt here, update/lookup/delete is implemented in all
>>> map types, if the operation is not supported it returns -EINVAL.
>>> For push/pop/peek, should we implement it in all maps or is it better to
>>> check the map type before invoking map->ops->push/pop/seek?
>>> (Maybe checking if map->ops->xxx is NULL will also work)
>> Since push/pop/peak are only for this queue/stack I thought we won't
>> be adding 'ops' for them and just call the helpers from progs.
>> But now I'm having second thoughts, since 3 new commands for syscall
>> feels like overkill.
>> At the same time I still don't feel that lookup == pop is the right alias.
>> Also what peak is going to alias to ?
>> May be let's go back to your original idea with a tweak:
>> push == update
>> peak == lookup
>> pop = lookup + delete
>> In other words in case of stack the bpf_map_lookup_elem will return
>> the pointer to value of top element in the stack and
>> bpf_map_delete_elem will delete that top element.
>> Then in user space we can introduce push/pop always_inline functions like:
>> void bpf_push(struct bpf_map *map, void *value)
>> {
>> bpf_map_update_elem(map, NULL/*key*/, value);
>> }
>>
>> void *bpf_pop(struct bpf_map *map)
>> {
>> void * val = bpf_map_lookup_elem(map, NULL/*key*/);
>> bpf_map_delete_elem(map, NULL/*key*/);
>> return val;
>> }
>>
>> Thoughts?
> I actually think that having new push/peak/pop BPF map helpers would
> be fine, as well as having them sit in map->ops. Initially I'd probably
> leave out the syscall ops counterparts so they can be used only from
> program.
I also think that having the new helpers is good. I don't have a strong
opinion about saving them in map->ops or not, in any case it has to be
verified that push/pop/peek are only invoked in stack/queue maps. I
think we could force in on the verifier, would it be ok?
> Potentially array and per-cpu array could implement them even by having
> an internal pointer to the current slot which we move on push/pop. This
> would potentially also avoid all the RCU callbacks to free the elements,
> elem allocations, list locking, and improve cache locality compared to
> the current implementation.
I could change the implementation, a linked list is used when
BPF_F_NO_PREALLOC is passed, otherwise a plain array with internal
indexes is used.
I have a question about RCU and a possible array-based implementation.
It should be guarantee that a pointer from a pop() operation is valid
for the whole program duration.
If an eBPF program A calls pop() it will get a pointer to the head in
the array, and the head index will move, it is possible that a second
program B (running on a different CPU) pushes a new element overwriting
the element pointed by A.
I think we need to use a more sophisticated mechanism than just an array
and two indexes. Am I missing something?
> Agree that existing ops are not the right alias, but deferring to user
> space as inline function also doesn't really seem like a good fit, imho,
> so I'd prefer rather to have something native. (Aside from that, the
> above inline bpf_pop() would also race between CPUs.)
I think we should have push/pop/peek syscalls as well, having a
bpf_pop() that is race prone would create problems. Users expected maps
operations to be safe, so having one that is not will confuse them.
> Thanks,
> Daniel
>
Thanks,
Mauricio.
Powered by blists - more mailing lists