[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d4b8843b-c5dc-4468-996a-bacc2db63f11@iogearbox.net>
Date: Mon, 19 Jan 2026 20:47:57 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Leon Hwang <leon.hwang@...ux.dev>, bpf@...r.kernel.org
Cc: Martin KaFai Lau <martin.lau@...ux.dev>,
Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, Shuah Khan <shuah@...nel.org>,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
kernel-patches-bot@...com, Kumar Kartikeya Dwivedi <memxor@...il.com>
Subject: Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when
popping LRU free nodes
On 1/19/26 3:21 PM, Leon Hwang wrote:
> Switch the free-node pop paths to raw_spin_trylock*() to avoid blocking
> on contended LRU locks.
>
> If the global or per-CPU LRU lock is unavailable, refuse to refill the
> local free list and return NULL instead. This allows callers to back
> off safely rather than blocking or re-entering the same lock context.
>
> This change avoids lockdep warnings and potential deadlocks caused by
> re-entrant LRU lock acquisition from NMI context, as shown below:
>
> [ 418.260323] bpf_testmod: oh no, recursing into test_1, recursion_misses 1
> [ 424.982207] ================================
> [ 424.982216] WARNING: inconsistent lock state
> [ 424.982223] inconsistent {INITIAL USE} -> {IN-NMI} usage.
> [ 424.982314] *** DEADLOCK ***
> [...]
>
> Signed-off-by: Leon Hwang <leon.hwang@...ux.dev>
> ---
> kernel/bpf/bpf_lru_list.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
Documentation/bpf/map_lru_hash_update.dot needs update?
> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
> index c091f3232cc5..03d37f72731a 100644
> --- a/kernel/bpf/bpf_lru_list.c
> +++ b/kernel/bpf/bpf_lru_list.c
> @@ -312,14 +312,15 @@ static void bpf_lru_list_push_free(struct bpf_lru_list *l,
> raw_spin_unlock_irqrestore(&l->lock, flags);
> }
>
> -static void bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
> +static bool bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
> struct bpf_lru_locallist *loc_l)
> {
> struct bpf_lru_list *l = &lru->common_lru.lru_list;
> struct bpf_lru_node *node, *tmp_node;
> unsigned int nfree = 0;
>
> - raw_spin_lock(&l->lock);
> + if (!raw_spin_trylock(&l->lock))
> + return false;
>
Could you provide some more analysis, and the effect this has on real-world
programs? Presumably they'll unexpectedly encounter a lot more frequent
-ENOMEM as an error on bpf_map_update_elem even though memory might be
available just that locks are contended?
Also, have you considered rqspinlock as a potential candidate to discover
deadlocks?
Powered by blists - more mailing lists