[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250109061901.2620825-1-houtao@huaweicloud.com>
Date: Thu, 9 Jan 2025 14:18:56 +0800
From: Hou Tao <houtao@...weicloud.com>
To: bpf@...r.kernel.org,
netdev@...r.kernel.org
Cc: Martin KaFai Lau <martin.lau@...ux.dev>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Andrii Nakryiko <andrii@...nel.org>,
Eduard Zingerman <eddyz87@...il.com>,
Song Liu <song@...nel.org>,
Hao Luo <haoluo@...gle.com>,
Yonghong Song <yonghong.song@...ux.dev>,
Daniel Borkmann <daniel@...earbox.net>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>,
Jiri Olsa <jolsa@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
houtao1@...wei.com,
xukuohai@...wei.com
Subject: [PATCH bpf-next v2 0/5] Free htab element out of bucket lock
From: Hou Tao <houtao1@...wei.com>
Hi,
The patch set continues the previous work [1] to move all the freeings
of htab elements out of bucket lock. One motivation for the patch set is
the locking problem reported by Sebastian [2]: the freeing of bpf_timer
under PREEMPT_RT may acquire a spin-lock (namely softirq_expiry_lock).
However the freeing procedure for htab element has already held a
raw-spin-lock (namely bucket lock), and it will trigger the warning:
"BUG: scheduling while atomic" as demonstrated by the selftests patch.
Another motivation is to reduce the locked scope of bucket lock.
However, the patch set still keep the free of special fields in
pre-allocated hash map under the protect of bucket log in
htab_map_update_elem().
The patch set is structured as follows:
* Patch #1 moves the element freeing out of lock for
htab_lru_map_delete_node()
* Patch #2~#3 move the element freeing out of lock for
__htab_map_lookup_and_delete_elem()
* Patch #4 cancels the bpf_timer in two steps to fix the locking
problem in htab_map_update_elem().
* Patch #5 adds a selftest for the locking problem
Please see individual patches for more details. Comments are always
welcome.
v2:
* cancels the bpf timer in two steps instead of breaking the reuse
the refill of per-cpu ->extra_elems into two steps
v1: https://lore.kernel.org/bpf/20250107085559.3081563-1-houtao@huaweicloud.com
Hou Tao (5):
bpf: Free special fields after unlock in htab_lru_map_delete_node()
bpf: Bail out early in __htab_map_lookup_and_delete_elem()
bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
bpf: Cancel the running bpf_timer through kworker
selftests/bpf: Add test case for the freeing of bpf_timer
kernel/bpf/hashtab.c | 60 ++++---
kernel/bpf/helpers.c | 17 +-
.../selftests/bpf/prog_tests/free_timer.c | 165 ++++++++++++++++++
.../testing/selftests/bpf/progs/free_timer.c | 71 ++++++++
4 files changed, 280 insertions(+), 33 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/free_timer.c
create mode 100644 tools/testing/selftests/bpf/progs/free_timer.c
--
2.29.2
Powered by blists - more mailing lists