[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250107085559.3081563-5-houtao@huaweicloud.com>
Date: Tue, 7 Jan 2025 16:55:56 +0800
From: Hou Tao <houtao@...weicloud.com>
To: bpf@...r.kernel.org,
netdev@...r.kernel.org
Cc: Martin KaFai Lau <martin.lau@...ux.dev>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Andrii Nakryiko <andrii@...nel.org>,
Eduard Zingerman <eddyz87@...il.com>,
Song Liu <song@...nel.org>,
Hao Luo <haoluo@...gle.com>,
Yonghong Song <yonghong.song@...ux.dev>,
Daniel Borkmann <daniel@...earbox.net>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>,
Jiri Olsa <jolsa@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
houtao1@...wei.com,
xukuohai@...wei.com
Subject: [PATCH bpf-next 4/7] bpf: Support refilling extra_elems in free_htab_elem()
From: Hou Tao <houtao1@...wei.com>
The following patch will move the invocation of check_and_free_fields()
in htab_map_update_elem() outside of the bucket lock. However, the
reason why the bucket lock is necessary is that the overwritten element
has already been stashed in htab->extra_elems when alloc_htab_elem()
returns. If invoking check_and_free_fields() after the bucket lock is
unlocked, the stashed element may be reused by concurrent update
procedure and the freeing in check_and_free_fields() will run
concurrently with the reuse and lead to bugs.
The fix breaks the reuse and stash of extra_elems into two steps:
1) reuse the per-cpu extra_elems with bucket lock being held.
2) refill per-cpu extra_elems after unlock bucket lock.
This patch adds support for stashing per-cpu extra_elems after bucket
lock is unlocked. The refill may run concurrently, therefore,
cmpxchg_release() is used. _release semantics is necessary to ensure the
freeing of ptrs or special fields in the map value is completed before
the element is reused by concurrent update process.
Signed-off-by: Hou Tao <houtao1@...wei.com>
---
kernel/bpf/hashtab.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 903447a340d3..3c6eebabb492 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -946,14 +946,28 @@ static void dec_elem_count(struct bpf_htab *htab)
atomic_dec(&htab->count);
}
-
-static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
+static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l, bool refill_extra)
{
htab_put_fd_value(htab, l);
if (htab_is_prealloc(htab)) {
- bpf_map_dec_elem_count(&htab->map);
check_and_free_fields(htab, l);
+
+ if (refill_extra) {
+ struct htab_elem **extra;
+
+ /* Use cmpxchg_release() to ensure the freeing of ptrs
+ * or special fields in map value is completed when the
+ * update procedure reuses the extra element. It will
+ * pair with smp_load_acquire() when reading extra_elems
+ * pointer.
+ */
+ extra = this_cpu_ptr(htab->extra_elems);
+ if (cmpxchg_release(extra, NULL, l) == NULL)
+ return;
+ }
+
+ bpf_map_dec_elem_count(&htab->map);
pcpu_freelist_push(&htab->freelist, &l->fnode);
} else {
dec_elem_count(htab);
@@ -1207,7 +1221,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
if (old_map_ptr)
map->ops->map_fd_put_ptr(map, old_map_ptr, true);
if (!htab_is_prealloc(htab))
- free_htab_elem(htab, l_old);
+ free_htab_elem(htab, l_old, false);
}
return 0;
err:
@@ -1461,7 +1475,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
htab_unlock_bucket(htab, b, hash, flags);
if (l)
- free_htab_elem(htab, l);
+ free_htab_elem(htab, l, false);
return ret;
}
@@ -1677,7 +1691,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
if (is_lru_map)
htab_lru_push_free(htab, l);
else
- free_htab_elem(htab, l);
+ free_htab_elem(htab, l, false);
}
return ret;
@@ -1899,7 +1913,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
if (is_lru_map)
htab_lru_push_free(htab, l);
else
- free_htab_elem(htab, l);
+ free_htab_elem(htab, l, false);
}
next_batch:
--
2.29.2
Powered by blists - more mailing lists