[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <910eaafa-57a1-2a22-709e-a9107b108d9d@zonque.org>
Date: Tue, 19 Sep 2017 21:48:16 +0200
From: Daniel Mack <daniel@...que.org>
To: Daniel Borkmann <daniel@...earbox.net>,
Craig Gallek <kraigatgoog@...il.com>,
Alexei Starovoitov <ast@...com>
Cc: "David S . Miller" <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 1/3] bpf: Implement map_delete_elem for
BPF_MAP_TYPE_LPM_TRIE
Hi,
Thanks for working on this, Craig!
On 09/19/2017 06:12 PM, Daniel Borkmann wrote:
> On 09/19/2017 05:08 PM, Craig Gallek wrote:
>> On Mon, Sep 18, 2017 at 6:53 PM, Alexei Starovoitov <ast@...com> wrote:
>>> On 9/18/17 12:30 PM, Craig Gallek wrote:
> [...]
>>>> +
>>>> + next_bit = extract_bit(key->data, node->prefixlen);
>>>> + /* If we hit a node that has more than one child or is a
>>>> valid
>>>> + * prefix itself, do not remove it. Reset the root of the
>>>> trim
>>>> + * path to its descendant on our path.
>>>> + */
>>>> + if (!(node->flags & LPM_TREE_NODE_FLAG_IM) ||
>>>> + (node->child[0] && node->child[1]))
>>>> + trim = &node->child[next_bit];
>>>> + node = rcu_dereference_protected(
>>>> + node->child[next_bit],
>>>> lockdep_is_held(&trie->lock));
>>>> + }
>>>> +
>>>> + if (!node || node->prefixlen != key->prefixlen ||
>>>> + (node->flags & LPM_TREE_NODE_FLAG_IM)) {
>>>> + ret = -ENOENT;
>>>> + goto out;
>>>> + }
>>>> +
>>>> + trie->n_entries--;
>>>> +
>>>> + /* If the node we are removing is not a leaf node, simply mark it
>>>> + * as intermediate and we are done.
>>>> + */
>>>> + if (rcu_access_pointer(node->child[0]) ||
>>>> + rcu_access_pointer(node->child[1])) {
>>>> + node->flags |= LPM_TREE_NODE_FLAG_IM;
>>>> + goto out;
>>>> + }
>>>> +
>>>> + /* trim should now point to the slot holding the start of a path
>>>> from
>>>> + * zero or more intermediate nodes to our leaf node for deletion.
>>>> + */
>>>> + while ((node = rcu_dereference_protected(
>>>> + *trim, lockdep_is_held(&trie->lock)))) {
>>>> + RCU_INIT_POINTER(*trim, NULL);
>>>> + trim = rcu_access_pointer(node->child[0]) ?
>>>> + &node->child[0] :
>>>> + &node->child[1];
>>>> + kfree_rcu(node, rcu);
>>>
>>> can it be that some of the nodes this loop walks have
>>> both child[0] and [1] ?
>> No, the loop above will push trim down the walk every time it
>> encounters a node with two children. The only other trim assignment
>> is the initial trim = &trie->root. But the only time we would skip
>> the assignment in the loop is if the node being removed is the root.
>> If the root had multiple children and is being removed, it would be
>> handled by the case that turns the node into an intermediate node
>> rather than walking the trim path freeing things.
>
> Looks good to me. We should probably still merge nodes once we turn
> a real node into an im which just has a single child attached to it;
> parent can be im or real node. Thus, we don't need to traverse this
> extra one on lookup.
Right, but only if the the parent of the node allows us to do that,
because the 'next bit' in the lookup key has to match the slot index.
To illustrate, consider the following trie with no IM nodes:
+----------------+
| (1) (R) |
| 192.168.0.0/16 |
| value: 1 |
| [0] [1] |
+----------------+
| |
+----------------+ +------------------+
| (2) | | (3) |
| 192.168.0.0/23 | | 192.168.128.0/24 |
| value: 2 | | value: 3 |
| [0] [1] | | [0] [1] |
+----------------+ +------------------+
|
+----------------+
| (4) |
| 192.168.1.0/24 |
| value: 4 |
| [0] [1] |
+----------------+
If you now try to delete (2), the node has to stay around because (3)
and (4) share the same value in bit 17 (1). If, however, (4) had a
prefix of 192.168.0.0/24, then (2) should be removed completely, and (4)
should be directly attached to (1) as child[0].
With this implementation, a situation in which multiple IM nodes appear
in a chain cannot emerge. And that again should make your trimming
algorithm simpler, because you only need to find an exact match, and
then handle three distinct cases:
a) the node as 0 children: simply remove it and nullify the pointer in
the parent
b) the node has 1 child: apply logic I described above
c) the node has 2 children: turn the node into an IM
Makes sense?
Thanks,
Daniel
Powered by blists - more mailing lists