[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <a4904d743818169a9a8c1e30dd51ee7a8672bf2b883e973dcca489ff28a3e708@mail.kernel.org>
Date: Mon, 19 Jan 2026 18:46:30 +0000 (UTC)
From: bot+bpf-ci@...nel.org
To: leon.hwang@...ux.dev,bpf@...r.kernel.org
Cc: martin.lau@...ux.dev,ast@...nel.org,daniel@...earbox.net,andrii@...nel.org,eddyz87@...il.com,song@...nel.org,yonghong.song@...ux.dev,john.fastabend@...il.com,kpsingh@...nel.org,sdf@...ichev.me,haoluo@...gle.com,jolsa@...nel.org,shuah@...nel.org,leon.hwang@...ux.dev,linux-kernel@...r.kernel.org,linux-kselftest@...r.kernel.org,kernel-patches-bot@...com,ast@...nel.org,andrii@...nel.org,daniel@...earbox.net,martin.lau@...nel.org,eddyz87@...il.com,yonghong.song@...ux.dev,clm@...a.com,ihor.solodrai@...ux.dev
Subject: Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
> index c091f3232cc5..03d37f72731a 100644
> --- a/kernel/bpf/bpf_lru_list.c
> +++ b/kernel/bpf/bpf_lru_list.c
[ ... ]
> @@ -451,13 +455,12 @@ static struct bpf_lru_node *bpf_common_lru_pop_free(struct bpf_lru *lru,
>
> loc_l = per_cpu_ptr(clru->local_list, cpu);
>
> - raw_spin_lock_irqsave(&loc_l->lock, flags);
> + if (!raw_spin_trylock_irqsave(&loc_l->lock, flags))
> + return NULL;
>
> node = __local_list_pop_free(loc_l);
> - if (!node) {
> - bpf_lru_list_pop_free_to_local(lru, loc_l);
> + if (!node && bpf_lru_list_pop_free_to_local(lru, loc_l))
> node = __local_list_pop_free(loc_l);
> - }
>
> if (node)
> __local_list_add_pending(lru, loc_l, cpu, node, hash);
The trylock conversion here looks correct, but what about the steal path
that follows later in bpf_common_lru_pop_free()? Looking at the existing
code (not shown in this diff), the steal loop still uses unconditional
raw_spin_lock_irqsave():
do {
steal_loc_l = per_cpu_ptr(clru->local_list, steal);
raw_spin_lock_irqsave(&steal_loc_l->lock, flags);
...
} while (...);
If the goal is to avoid NMI-context deadlocks when acquiring LRU locks,
can the same deadlock scenario occur when NMI interrupts during the steal
loop and the NMI handler tries to acquire the same steal_loc_l->lock?
Similarly, after a successful steal, there is another unconditional lock:
if (node) {
raw_spin_lock_irqsave(&loc_l->lock, flags);
__local_list_add_pending(lru, loc_l, cpu, node, hash);
raw_spin_unlock_irqrestore(&loc_l->lock, flags);
}
Should these also use trylock to maintain consistency with the stated goal
of avoiding NMI-context deadlocks?
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21147913717
Powered by blists - more mailing lists