linux-kernel - Re: [PATCH bpf v3 0/4] bpf: Free special fields when update hash and local storage maps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3YRi+ACP1+uCDaDZDPGobrZiUmZrPZfr73W1cHCtKHEgw@mail.gmail.com>
Date: Wed, 29 Oct 2025 14:57:29 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Leon Hwang <leon.hwang@...ux.dev>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>, patchwork-bot+netdevbpf@...nel.org, 
	bpf@...r.kernel.org, ast@...nel.org, andrii@...nel.org, daniel@...earbox.net, 
	martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org, 
	yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org, 
	sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org, memxor@...il.com, 
	linux-kernel@...r.kernel.org, kernel-patches-bot@...com
Subject: Re: [PATCH bpf v3 0/4] bpf: Free special fields when update hash and
 local storage maps

On Wed, Oct 29, 2025 at 2:50 PM Leon Hwang <leon.hwang@...ux.dev> wrote:
>
>
>
> On 29/10/25 04:22, Andrii Nakryiko wrote:
> > On Tue, Oct 28, 2025 at 11:10 AM <patchwork-bot+netdevbpf@...nel.org> wrote:
[......]
> >   [  424.982379]  ? bpf_lru_pop_free+0x2c6/0x1a50
> >   [  424.982382]  bpf_lru_pop_free+0x2c6/0x1a50
>
> Right, this is the classic NMI vs spinlock deadlock:
>
> Process Context (CPU 0)         NMI Context (CPU 0)
> =======================         ===================
>
>     syscall()
>        |
>        +-> htab_lru_map_update_elem()
>        |
>        +-> bpf_lru_pop_free()
>        |
>        +-> spin_lock_irqsave(&lock)
>        |   +-------------------+
>        |   | LOCK ACQUIRED [Y] |
>        |   | IRQs DISABLED     |
>        |   +-------------------+
>        |
>        +-> [Critical Section]
>        |   |
>        |   | Working with LRU...
>        |   |
>        |   |                      +-----------------------+
>        |   |<---------------------+ ! NMI FIRES!          |
>        |   |                      +-----------------------+
>        |   |                      | (IRQs disabled but    |
>        |   |                      |  NMI ignores that!)   |
>        |   |                      +-----------------------+
>        |   |                                 |
>        |   | [INTERRUPTED]                   |
>        |   | [Context saved]                 |
>        |   |                                 v
>        |   |                     perf_event_nmi_handler()
>        |   |                                 |
>        |   |                                 +-> BPF program
>        |   |                                 |
>        |   |                                 +-> htab_lru_map_
>        |   |                                 |   update_elem()
>        |   |                                 |
>        |   |                                 +-> bpf_lru_pop_
>        |   |                                 |   free()
>        |   |                                 |
>        |   |                                 +-> spin_lock_
>        |   |                                 |   irqsave(&lock)
>        |   |                                 |   +------------+
>        |   |                                 |   | TRIES TO   |
>        |   |                                 |   | ACQUIRE    |
>        |   |                                 |   | SAME LOCK! |
>        |   |                                 |   +------------+
>        |   |                                 |        |
>        |   |                                 |        v
>        |   |                                 |   +------------+
>        |   |<--------------------------------+---+ ! DEADLOCK |
>        |   |                                 |   +------------+
>        |   |                                 |   | Lock held  |
>        |   | Still holding lock...           |   | by process |
>        |   | Waiting for NMI to finish ---+  |   | context    |
>        |   |                              |  |   |            |
>        |   |                              |  |   | NMI waits  |
>        |   |                              |  |   | for same   |
>        |   |                              |  |   | lock       |
>        |   |                              |  |   +------------+
>        |   |                              |  |        |
>        |   |                              |  |        v
>        |   |                              |  |   [Spin forever]
>        |   |                              |  |        |
>        |   |                              |  +--------+
>        |   |                              |  (Circular wait)
>        |   |                              |
>        |   |                              +-> SYSTEM HUNG
>        |   |
>        |   +-> [Never reached]
>        |
>        +-> spin_unlock_irqrestore(&lock)
>            [Never reached]
>
>
> +---------------------------------------------------------------------+
> |                       DEADLOCK SUMMARY                              |
> +---------------------------------------------------------------------+
> |                                                                     |
> | Process Context: Holds &loc_l->lock, waiting for NMI to finish      |
> |                                                                     |
> | NMI Context:     Trying to acquire &loc_l->lock                     |
> |                  (same lock, same CPU)                              |
> |                                                                     |
> | Result:          Both contexts wait for each other = DEADLOCK       |
> |                                                                     |
> +---------------------------------------------------------------------+
>
> We can fix this by converting the raw_spinlock_t to trylock-based
> approach, similar to the fix for ringbuf in
> commit a650d38915c1 ("bpf: Convert ringbuf map to rqspinlock").

Nice shot!

>
> In bpf_common_lru_pop_free(), we could use:
>
>     if (!raw_res_spin_lock_irqsave(&loc_l->lock, flags))
>         return NULL;
>
> However, this deadlock is pre-existing and not introduced by this
> series. It's better to send a separate fix for this deadlock.
>
> Hi Menglong, could you follow up on the deadlock fix?

Yeah, with pleasure. As I see, this is not the only place
that can cause deadlock and rqspinlock should be used. And
I'll follow up on such deadlocks.

Thanks!
Menglong Dong

>
> Thanks,
> Leon
>