[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3YRi+ACP1+uCDaDZDPGobrZiUmZrPZfr73W1cHCtKHEgw@mail.gmail.com>
Date: Wed, 29 Oct 2025 14:57:29 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Leon Hwang <leon.hwang@...ux.dev>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>, patchwork-bot+netdevbpf@...nel.org,
bpf@...r.kernel.org, ast@...nel.org, andrii@...nel.org, daniel@...earbox.net,
martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org,
yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org,
sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org, memxor@...il.com,
linux-kernel@...r.kernel.org, kernel-patches-bot@...com
Subject: Re: [PATCH bpf v3 0/4] bpf: Free special fields when update hash and
local storage maps
On Wed, Oct 29, 2025 at 2:50 PM Leon Hwang <leon.hwang@...ux.dev> wrote:
>
>
>
> On 29/10/25 04:22, Andrii Nakryiko wrote:
> > On Tue, Oct 28, 2025 at 11:10 AM <patchwork-bot+netdevbpf@...nel.org> wrote:
[......]
> > [ 424.982379] ? bpf_lru_pop_free+0x2c6/0x1a50
> > [ 424.982382] bpf_lru_pop_free+0x2c6/0x1a50
>
> Right, this is the classic NMI vs spinlock deadlock:
>
> Process Context (CPU 0) NMI Context (CPU 0)
> ======================= ===================
>
> syscall()
> |
> +-> htab_lru_map_update_elem()
> |
> +-> bpf_lru_pop_free()
> |
> +-> spin_lock_irqsave(&lock)
> | +-------------------+
> | | LOCK ACQUIRED [Y] |
> | | IRQs DISABLED |
> | +-------------------+
> |
> +-> [Critical Section]
> | |
> | | Working with LRU...
> | |
> | | +-----------------------+
> | |<---------------------+ ! NMI FIRES! |
> | | +-----------------------+
> | | | (IRQs disabled but |
> | | | NMI ignores that!) |
> | | +-----------------------+
> | | |
> | | [INTERRUPTED] |
> | | [Context saved] |
> | | v
> | | perf_event_nmi_handler()
> | | |
> | | +-> BPF program
> | | |
> | | +-> htab_lru_map_
> | | | update_elem()
> | | |
> | | +-> bpf_lru_pop_
> | | | free()
> | | |
> | | +-> spin_lock_
> | | | irqsave(&lock)
> | | | +------------+
> | | | | TRIES TO |
> | | | | ACQUIRE |
> | | | | SAME LOCK! |
> | | | +------------+
> | | | |
> | | | v
> | | | +------------+
> | |<--------------------------------+---+ ! DEADLOCK |
> | | | +------------+
> | | | | Lock held |
> | | Still holding lock... | | by process |
> | | Waiting for NMI to finish ---+ | | context |
> | | | | | |
> | | | | | NMI waits |
> | | | | | for same |
> | | | | | lock |
> | | | | +------------+
> | | | | |
> | | | | v
> | | | | [Spin forever]
> | | | | |
> | | | +--------+
> | | | (Circular wait)
> | | |
> | | +-> SYSTEM HUNG
> | |
> | +-> [Never reached]
> |
> +-> spin_unlock_irqrestore(&lock)
> [Never reached]
>
>
> +---------------------------------------------------------------------+
> | DEADLOCK SUMMARY |
> +---------------------------------------------------------------------+
> | |
> | Process Context: Holds &loc_l->lock, waiting for NMI to finish |
> | |
> | NMI Context: Trying to acquire &loc_l->lock |
> | (same lock, same CPU) |
> | |
> | Result: Both contexts wait for each other = DEADLOCK |
> | |
> +---------------------------------------------------------------------+
>
> We can fix this by converting the raw_spinlock_t to trylock-based
> approach, similar to the fix for ringbuf in
> commit a650d38915c1 ("bpf: Convert ringbuf map to rqspinlock").
Nice shot!
>
> In bpf_common_lru_pop_free(), we could use:
>
> if (!raw_res_spin_lock_irqsave(&loc_l->lock, flags))
> return NULL;
>
> However, this deadlock is pre-existing and not introduced by this
> series. It's better to send a separate fix for this deadlock.
>
> Hi Menglong, could you follow up on the deadlock fix?
Yeah, with pleasure. As I see, this is not the only place
that can cause deadlock and rqspinlock should be used. And
I'll follow up on such deadlocks.
Thanks!
Menglong Dong
>
> Thanks,
> Leon
>
Powered by blists - more mailing lists