[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQKMOK8jMcYugtrjsqh3O-C_hUsKVL6jKfcGVD5g_1pVxQ@mail.gmail.com>
Date: Mon, 19 Jan 2026 18:01:52 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Leon Hwang <leon.hwang@...ux.dev>
Cc: bot+bpf-ci@...nel.org, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, Eduard <eddyz87@...il.com>,
Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Shuah Khan <shuah@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>, kernel-patches-bot@...com,
Martin KaFai Lau <martin.lau@...nel.org>, Chris Mason <clm@...a.com>,
Ihor Solodrai <ihor.solodrai@...ux.dev>
Subject: Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when
popping LRU free nodes
On Mon, Jan 19, 2026 at 5:57 PM Leon Hwang <leon.hwang@...ux.dev> wrote:
>
>
>
> On 20/1/26 02:46, bot+bpf-ci@...nel.org wrote:
> >> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
> >> index c091f3232cc5..03d37f72731a 100644
> >> --- a/kernel/bpf/bpf_lru_list.c
> >> +++ b/kernel/bpf/bpf_lru_list.c
> >
> > [ ... ]
> >
> >> @@ -451,13 +455,12 @@ static struct bpf_lru_node *bpf_common_lru_pop_free(struct bpf_lru *lru,
> >>
> >> loc_l = per_cpu_ptr(clru->local_list, cpu);
> >>
> >> - raw_spin_lock_irqsave(&loc_l->lock, flags);
> >> + if (!raw_spin_trylock_irqsave(&loc_l->lock, flags))
> >> + return NULL;
> >>
> >> node = __local_list_pop_free(loc_l);
> >> - if (!node) {
> >> - bpf_lru_list_pop_free_to_local(lru, loc_l);
> >> + if (!node && bpf_lru_list_pop_free_to_local(lru, loc_l))
> >> node = __local_list_pop_free(loc_l);
> >> - }
> >>
> >> if (node)
> >> __local_list_add_pending(lru, loc_l, cpu, node, hash);
> >
> > The trylock conversion here looks correct, but what about the steal path
> > that follows later in bpf_common_lru_pop_free()? Looking at the existing
> > code (not shown in this diff), the steal loop still uses unconditional
> > raw_spin_lock_irqsave():
> >
> > do {
> > steal_loc_l = per_cpu_ptr(clru->local_list, steal);
> >
> > raw_spin_lock_irqsave(&steal_loc_l->lock, flags);
> > ...
> > } while (...);
> >
> > If the goal is to avoid NMI-context deadlocks when acquiring LRU locks,
> > can the same deadlock scenario occur when NMI interrupts during the steal
> > loop and the NMI handler tries to acquire the same steal_loc_l->lock?
> >
> > Similarly, after a successful steal, there is another unconditional lock:
> >
> > if (node) {
> > raw_spin_lock_irqsave(&loc_l->lock, flags);
> > __local_list_add_pending(lru, loc_l, cpu, node, hash);
> > raw_spin_unlock_irqrestore(&loc_l->lock, flags);
> > }
> >
> > Should these also use trylock to maintain consistency with the stated goal
> > of avoiding NMI-context deadlocks?
> >
>
> This patch is not intended to eliminate all possible deadlock scenarios.
> Its goal is to avoid deadlocks caused by long-lived critical sections
> in the free-node pop paths, where lock contention can persist and lead
> to re-entrant lock acquisition from NMI context.
>
> The steal path and the post-steal update are both short-lived critical
> sections. They do not exhibit the same contention characteristics and
> have not been observed to trigger the reported deadlock scenarios.
> Converting these paths to trylock would add complexity without clear
> benefit, and is therefore unnecessary for the stated goal of this change.
AI is correct. Either everything needs to be converted or none.
Adding trylock in a few places because syzbot found them is not fixing anything.
Just silencing one (or a few?) syzbot reports.
As I said in the other email, trylock is not an option.
rqspinlock is the only true way of addressing potential deadlocks.
If it's too hard, then leave it as-is. Do not hack things half way.
Powered by blists - more mailing lists