[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151215225118.GA67370@ast-mbp.thefacebook.com>
Date: Tue, 15 Dec 2015 14:51:19 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Ming Lei <tom.leiming@...il.com>
Cc: linux-kernel@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH 4/6] bpf: hash: convert per-hashtable lock into
per-bucket bit spinlock
On Tue, Dec 15, 2015 at 07:21:02PM +0800, Ming Lei wrote:
> Both htab_map_update_elem() and htab_map_delete_elem() can be
> called from eBPF program, and they may be in kernel hot path,
> so it isn't efficient to use a per-hashtable lock in this two
> helpers.
>
> The per-hashtable spinlock is used just for protecting bucket's
> hlist, and per-bucket lock should be enough. This patch converts
> the per-hashtable lock into per-bucket bit spinlock, so that
> contention can be decreased a lot, and no extra memory can be
> consumed for these locks.
>
> Signed-off-by: Ming Lei <tom.leiming@...il.com>
thank you for working on this.
Interesting stuff!
> /* bpf_map_update_elem() can be called in_irq() */
> - raw_spin_lock_irqsave(&htab->lock, flags);
> + raw_local_irq_save(flags);
> + bit_spin_lock(HLIST_LOCK_BIT, (unsigned long *)&head->first);
can you add a helper for bit_spin_lock/unlock as well so that whole hlist+bit
api looks consistent?
>
> - l_old = lookup_elem_raw(head, l_new->hash, key, key_size);
> + l_old = lookup_elem_raw(hlist_get_head_lock(head, &h), l_new->hash,
> + key, key_size);
>
> if (!l_old && unlikely(atomic_read(&htab->count) >= map->max_entries)) {
> /* if elem with this 'key' doesn't exist and we've reached
> @@ -278,18 +284,20 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
> /* add new element to the head of the list, so that concurrent
> * search will find it before old elem
> */
> - hlist_add_head_rcu(&l_new->hash_node, head);
> + hlist_add_head_rcu_lock(&l_new->hash_node, head);
I think the new macros have confusing names:
+#define hlist_get_first(h) \
+ (struct hlist_node *)((unsigned long)(h)->first & ~HLIST_LOCK_MASK)
+#define hlist_get_lock(h) ((unsigned long)(h)->first & HLIST_LOCK_MASK)
+#define hlist_make_1st_node(h, n) \
+ (struct hlist_node *)((unsigned long)(n) | hlist_get_lock(h))
+
+static inline struct hlist_head *hlist_get_head_lock(
+ struct hlist_head *head, struct hlist_head *new_head)
+{
+ new_head->first = hlist_get_first(head);
+ return new_head;
+}
This new hlist_add_head_rcu_lock() adds new element and it doesn't take new lock.
May be rename this new api as 'locked_hlist' ?
Then all normal helpers will convert like:
hlist_add_head_rcu() -> locked_hlist_add_head_rcu()
> if (l_old) {
> - hlist_del_rcu(&l_old->hash_node);
> + hlist_del_rcu_lock(&l_old->hash_node);
and here it will be:
hlist_del_rcu() -> locked_hlist_del_rcu()
Also is there a race here ?
+static inline void hlist_add_head_rcu_lock(struct hlist_node *n,
+ struct hlist_head *h)
+{
+ struct hlist_node *first = hlist_get_first(h);
+
+ n->next = first;
+ n->pprev = &h->first;
+ rcu_assign_pointer(hlist_first_rcu(h), hlist_make_1st_node(h, n));
+ if (first)
+ first->pprev = &n->next;
+}
Do you need cmpxchg when updatding hlist_head->first ?
Overall looks like a very interesting idea, but I'm not sure about
trade-off of saving 8 bytes for rounded-up spinlock per bucket.
The loss of lockdep is concerning.
Have you compared performance of this bit-lock embedded inside hlist_head vs
just adding spinlock_t for every hlist_head?
Another concern is that hlist walking may be more costly due to
requirement of having to clear that bit while doing the walk in lookup().
I guess I would prefer normal spinlock per bucket. Additional 8 * n_buckets bytes
don't seem to be worth optimizing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists