[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250107085559.3081563-1-houtao@huaweicloud.com>
Date: Tue, 7 Jan 2025 16:55:52 +0800
From: Hou Tao <houtao@...weicloud.com>
To: bpf@...r.kernel.org,
netdev@...r.kernel.org
Cc: Martin KaFai Lau <martin.lau@...ux.dev>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Andrii Nakryiko <andrii@...nel.org>,
Eduard Zingerman <eddyz87@...il.com>,
Song Liu <song@...nel.org>,
Hao Luo <haoluo@...gle.com>,
Yonghong Song <yonghong.song@...ux.dev>,
Daniel Borkmann <daniel@...earbox.net>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>,
Jiri Olsa <jolsa@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
houtao1@...wei.com,
xukuohai@...wei.com
Subject: [PATCH bpf-next 0/7] Free htab element out of bucket lock
From: Hou Tao <houtao1@...wei.com>
Hi,
The patch set continues the previous work [1] to move all the freeings
of htab elements out of bucket lock. One motivation for the patch set is
the locking problem reported by Sebastian [2]: the freeing of bpf_timer
under PREEMPT_RT may acquire a spin-lock (namely softirq_expiry_lock).
However the freeing procedure for htab element has already held a
raw-spin-lock (namely bucket lock), and it will trigger the warning:
"BUG: scheduling while atomic" as demonstrated by the selftests patch.
Another motivation is to reduce the locked scope of bucket lock.
The patch set is structured as follows:
* Patch #1 moves the element freeing out of lock for
htab_lru_map_delete_node()
* Patch #2~#3 move the element freeing out of lock for
__htab_map_lookup_and_delete_elem()
* Patch #4~#6 move the element freeing out of lock for
htab_map_update_elem()
* Patch #7 adds a selftest for the locking problem
The changes for htab_map_update_elem() require some explanation. The
reason that the previous work [1] can't move the element freeing out of
the bucket lock for preallocated hash table is due to ->extra_elems
optimization. When alloc_htab_elem() returns, the existed-old element
has already been stashed in per-cpu ->extra_elems. To handle that, patch
#5~#7 break the reuse of ->extra_elems and the refill of ->extra_elems
into two independent steps, do resue with bucket lock being held and do
refill after unlocking the bucket lock. The downside is that concurrent
updates on the same CPU may need to pop free element from per-cpu list
instead of reusing ->extra_elems directly, but I think such case will be
rare.
Please see individual patches for more details. Comments are always
welcome.
[1]: https://lore.kernel.org/bpf/20241106063542.357743-1-houtao@huaweicloud.com
[2]: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de
Hou Tao (7):
bpf: Free special fields after unlock in htab_lru_map_delete_node()
bpf: Bail out early in __htab_map_lookup_and_delete_elem()
bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
bpf: Support refilling extra_elems in free_htab_elem()
bpf: Factor out the element allocation for pre-allocated htab
bpf: Free element after unlock for pre-allocated htab
selftests/bpf: Add test case for the freeing of bpf_timer
kernel/bpf/hashtab.c | 170 ++++++++++--------
.../selftests/bpf/prog_tests/free_timer.c | 165 +++++++++++++++++
.../testing/selftests/bpf/progs/free_timer.c | 71 ++++++++
3 files changed, 332 insertions(+), 74 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/free_timer.c
create mode 100644 tools/testing/selftests/bpf/progs/free_timer.c
--
2.29.2
Powered by blists - more mailing lists