lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250107085559.3081563-5-houtao@huaweicloud.com>
Date: Tue,  7 Jan 2025 16:55:56 +0800
From: Hou Tao <houtao@...weicloud.com>
To: bpf@...r.kernel.org,
	netdev@...r.kernel.org
Cc: Martin KaFai Lau <martin.lau@...ux.dev>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Andrii Nakryiko <andrii@...nel.org>,
	Eduard Zingerman <eddyz87@...il.com>,
	Song Liu <song@...nel.org>,
	Hao Luo <haoluo@...gle.com>,
	Yonghong Song <yonghong.song@...ux.dev>,
	Daniel Borkmann <daniel@...earbox.net>,
	KP Singh <kpsingh@...nel.org>,
	Stanislav Fomichev <sdf@...ichev.me>,
	Jiri Olsa <jolsa@...nel.org>,
	John Fastabend <john.fastabend@...il.com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	houtao1@...wei.com,
	xukuohai@...wei.com
Subject: [PATCH bpf-next 4/7] bpf: Support refilling extra_elems in free_htab_elem()

From: Hou Tao <houtao1@...wei.com>

The following patch will move the invocation of check_and_free_fields()
in htab_map_update_elem() outside of the bucket lock. However, the
reason why the bucket lock is necessary is that the overwritten element
has already been stashed in htab->extra_elems when alloc_htab_elem()
returns. If invoking check_and_free_fields() after the bucket lock is
unlocked, the stashed element may be reused by concurrent update
procedure and the freeing in check_and_free_fields() will run
concurrently with the reuse and lead to bugs.

The fix breaks the reuse and stash of extra_elems into two steps:
1) reuse the per-cpu extra_elems with bucket lock being held.
2) refill per-cpu extra_elems after unlock bucket lock.

This patch adds support for stashing per-cpu extra_elems after bucket
lock is unlocked. The refill may run concurrently, therefore,
cmpxchg_release() is used. _release semantics is necessary to ensure the
freeing of ptrs or special fields in the map value is completed before
the element is reused by concurrent update process.

Signed-off-by: Hou Tao <houtao1@...wei.com>
---
 kernel/bpf/hashtab.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 903447a340d3..3c6eebabb492 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -946,14 +946,28 @@ static void dec_elem_count(struct bpf_htab *htab)
 		atomic_dec(&htab->count);
 }
 
-
-static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
+static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l, bool refill_extra)
 {
 	htab_put_fd_value(htab, l);
 
 	if (htab_is_prealloc(htab)) {
-		bpf_map_dec_elem_count(&htab->map);
 		check_and_free_fields(htab, l);
+
+		if (refill_extra) {
+			struct htab_elem **extra;
+
+			/* Use cmpxchg_release() to ensure the freeing of ptrs
+			 * or special fields in map value is completed when the
+			 * update procedure reuses the extra element. It will
+			 * pair with smp_load_acquire() when reading extra_elems
+			 * pointer.
+			 */
+			extra = this_cpu_ptr(htab->extra_elems);
+			if (cmpxchg_release(extra, NULL, l) == NULL)
+				return;
+		}
+
+		bpf_map_dec_elem_count(&htab->map);
 		pcpu_freelist_push(&htab->freelist, &l->fnode);
 	} else {
 		dec_elem_count(htab);
@@ -1207,7 +1221,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		if (old_map_ptr)
 			map->ops->map_fd_put_ptr(map, old_map_ptr, true);
 		if (!htab_is_prealloc(htab))
-			free_htab_elem(htab, l_old);
+			free_htab_elem(htab, l_old, false);
 	}
 	return 0;
 err:
@@ -1461,7 +1475,7 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
 	htab_unlock_bucket(htab, b, hash, flags);
 
 	if (l)
-		free_htab_elem(htab, l);
+		free_htab_elem(htab, l, false);
 	return ret;
 }
 
@@ -1677,7 +1691,7 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 		if (is_lru_map)
 			htab_lru_push_free(htab, l);
 		else
-			free_htab_elem(htab, l);
+			free_htab_elem(htab, l, false);
 	}
 
 	return ret;
@@ -1899,7 +1913,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		if (is_lru_map)
 			htab_lru_push_free(htab, l);
 		else
-			free_htab_elem(htab, l);
+			free_htab_elem(htab, l, false);
 	}
 
 next_batch:
-- 
2.29.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ