lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250107085559.3081563-1-houtao@huaweicloud.com>
Date: Tue,  7 Jan 2025 16:55:52 +0800
From: Hou Tao <houtao@...weicloud.com>
To: bpf@...r.kernel.org,
	netdev@...r.kernel.org
Cc: Martin KaFai Lau <martin.lau@...ux.dev>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Andrii Nakryiko <andrii@...nel.org>,
	Eduard Zingerman <eddyz87@...il.com>,
	Song Liu <song@...nel.org>,
	Hao Luo <haoluo@...gle.com>,
	Yonghong Song <yonghong.song@...ux.dev>,
	Daniel Borkmann <daniel@...earbox.net>,
	KP Singh <kpsingh@...nel.org>,
	Stanislav Fomichev <sdf@...ichev.me>,
	Jiri Olsa <jolsa@...nel.org>,
	John Fastabend <john.fastabend@...il.com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	houtao1@...wei.com,
	xukuohai@...wei.com
Subject: [PATCH bpf-next 0/7] Free htab element out of bucket lock

From: Hou Tao <houtao1@...wei.com>

Hi,

The patch set continues the previous work [1] to move all the freeings
of htab elements out of bucket lock. One motivation for the patch set is
the locking problem reported by Sebastian [2]: the freeing of bpf_timer
under PREEMPT_RT may acquire a spin-lock (namely softirq_expiry_lock).
However the freeing procedure for htab element has already held a
raw-spin-lock (namely bucket lock), and it will trigger the warning:
"BUG: scheduling while atomic" as demonstrated by the selftests patch.
Another motivation is to reduce the locked scope of bucket lock.

The patch set is structured as follows:

* Patch #1 moves the element freeing out of lock for
  htab_lru_map_delete_node()
* Patch #2~#3 move the element freeing out of lock for
  __htab_map_lookup_and_delete_elem()
* Patch #4~#6 move the element freeing out of lock for
  htab_map_update_elem()
* Patch #7 adds a selftest for the locking problem

The changes for htab_map_update_elem() require some explanation. The
reason that the previous work [1] can't move the element freeing out of
the bucket lock for preallocated hash table is due to ->extra_elems
optimization. When alloc_htab_elem() returns, the existed-old element
has already been stashed in per-cpu ->extra_elems. To handle that, patch
#5~#7 break the reuse of ->extra_elems and the refill of ->extra_elems
into two independent steps, do resue with bucket lock being held and do
refill after unlocking the bucket lock. The downside is that concurrent
updates on the same CPU may need to pop free element from per-cpu list
instead of reusing ->extra_elems directly, but I think such case will be
rare.

Please see individual patches for more details. Comments are always
welcome.

[1]: https://lore.kernel.org/bpf/20241106063542.357743-1-houtao@huaweicloud.com
[2]: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de

Hou Tao (7):
  bpf: Free special fields after unlock in htab_lru_map_delete_node()
  bpf: Bail out early in __htab_map_lookup_and_delete_elem()
  bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()
  bpf: Support refilling extra_elems in free_htab_elem()
  bpf: Factor out the element allocation for pre-allocated htab
  bpf: Free element after unlock for pre-allocated htab
  selftests/bpf: Add test case for the freeing of bpf_timer

 kernel/bpf/hashtab.c                          | 170 ++++++++++--------
 .../selftests/bpf/prog_tests/free_timer.c     | 165 +++++++++++++++++
 .../testing/selftests/bpf/progs/free_timer.c  |  71 ++++++++
 3 files changed, 332 insertions(+), 74 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/free_timer.c
 create mode 100644 tools/testing/selftests/bpf/progs/free_timer.c

-- 
2.29.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ