lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKvgROcpdCJu726x=jCYNnXLwW=1RN5XR0Q_kbON15zng@mail.gmail.com>
Date: Mon, 3 Nov 2025 04:36:45 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Gal Pressman <gal@...dia.com>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, netdev@...r.kernel.org, 
	linux-rt-devel@...ts.linux.dev, "David S. Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Clark Williams <clrkwllms@...nel.org>, Steven Rostedt <rostedt@...dmis.org>, 
	syzbot+8715dd783e9b0bef43b1@...kaller.appspotmail.com
Subject: Re: [PATCH net] net: gro_cells: Use nested-BH locking for gro_cell

On Mon, Nov 3, 2025 at 4:20 AM Gal Pressman <gal@...dia.com> wrote:
>
> On 09/10/2025 12:43, Sebastian Andrzej Siewior wrote:
> > The gro_cell data structure is per-CPU variable and relies on disabled
> > BH for its locking. Without per-CPU locking in local_bh_disable() on
> > PREEMPT_RT this data structure requires explicit locking.
> >
> > Add a local_lock_t to the data structure and use
> > local_lock_nested_bh() for locking. This change adds only lockdep
> > coverage and does not alter the functional behaviour for !PREEMPT_RT.
> >
> > Reported-by: syzbot+8715dd783e9b0bef43b1@...kaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/68c6c3b1.050a0220.2ff435.0382.GAE@google.com/
> > Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
>
> Hello Sebastian,
>
> This patch results in the following lockdep warning [1] when running
> IPsec + vxlan tests, can you please take a look?
>
> If needed, you can see the test here [2], though it might be a bit
> outdated.
>
> FWIW, Eric's patch [3] did not solve this issue.
>
> [1]
>
>  [ 6953.101639] ============================================
>  [ 6953.103703] WARNING: possible recursive locking detected
>  [ 6953.105293] 6.18.0-rc3_for_upstream_debug_2025_10_30_15_01 #1 Not tainted
>  [ 6953.107235] --------------------------------------------
>  [ 6953.108814] swapper/0/0 is trying to acquire lock:
>  [ 6953.109926] ffffe8ffff8234b0 (&cell->bh_lock){+.-.}-{3:3}, at: gro_cells_receive+0x3a2/0x8e0
>  [ 6953.110756]
>  [ 6953.110756] but task is already holding lock:
>  [ 6953.111377] ffff8884d3a52700 (&cell->bh_lock){+.-.}-{3:3}, at: gro_cell_poll+0x86/0x560
>  [ 6953.112163]
>  [ 6953.112163] other info that might help us debug this:
>  [ 6953.112831]  Possible unsafe locking scenario:
>  [ 6953.112831]
>  [ 6953.113460]        CPU0
>  [ 6953.113768]        ----
>  [ 6953.114075]   lock(&cell->bh_lock);
>  [ 6953.114468]   lock(&cell->bh_lock);
>  [ 6953.114854]
>  [ 6953.114854]  *** DEADLOCK ***
>  [ 6953.114854]
>  [ 6953.115529]  May be due to missing lock nesting notation
>  [ 6953.115529]
>  [ 6953.116233] 5 locks held by swapper/0/0:
>  [ 6953.116652]  #0: ffff8884d3a52700 (&cell->bh_lock){+.-.}-{3:3}, at: gro_cell_poll+0x86/0x560
>  [ 6953.117606]  #1: ffffffff8506b9a0 (rcu_read_lock){....}-{1:3}, at: netif_receive_skb_list_internal+0x309/0xfa0
>  [ 6953.119051]  #2: ffffffff8506b9a0 (rcu_read_lock){....}-{1:3}, at: ip_local_deliver_finish+0x2dc/0x5e0
>  [ 6953.120345]  #3: ffffffff8506b9a0 (rcu_read_lock){....}-{1:3}, at: vxlan_rcv+0xa94/0x4180 [vxlan]
>  [ 6953.121737]  #4: ffffffff8506b9a0 (rcu_read_lock){....}-{1:3}, at: gro_cells_receive+0x4a/0x8e0
>  [ 6953.123062]
>  [ 6953.123062] stack backtrace:
>  [ 6953.123603] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc3_for_upstream_debug_2025_10_30_15_01 #1 NONE
>  [ 6953.123609] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>  [ 6953.123614] Call Trace:
>  [ 6953.123619]  <IRQ>
>  [ 6953.123621]  dump_stack_lvl+0x69/0xa0
>  [ 6953.123628]  print_deadlock_bug.cold+0xbd/0xca
>  [ 6953.123633]  __lock_acquire+0x168b/0x2f60
>  [ 6953.123640]  lock_acquire+0x10e/0x300
>  [ 6953.123643]  ? gro_cells_receive+0x3a2/0x8e0
>  [ 6953.123647]  ? lock_acquire+0x10e/0x300
>  [ 6953.123651]  ? vxlan_rcv+0xa94/0x4180 [vxlan]
>  [ 6953.123663]  gro_cells_receive+0x3ab/0x8e0
>  [ 6953.123667]  ? gro_cells_receive+0x3a2/0x8e0
>  [ 6953.123671]  vxlan_rcv+0xbb7/0x4180 [vxlan]
>  [ 6953.123683]  ? encap_bypass_if_local+0x1e0/0x1e0 [vxlan]
>  [ 6953.123692]  ? nf_conntrack_double_lock+0xc3/0xd0
>  [ 6953.123698]  ? udp_queue_rcv_one_skb+0xa41/0x1420
>  [ 6953.123702]  udp_queue_rcv_one_skb+0xa41/0x1420
>  [ 6953.123706]  ? __udp_enqueue_schedule_skb+0x1160/0x1160
>  [ 6953.123711]  udp_unicast_rcv_skb+0x106/0x330
>  [ 6953.123714]  __udp4_lib_rcv+0xe55/0x3160
>  [ 6953.123720]  ? udp_sk_rx_dst_set+0x70/0x70
>  [ 6953.123723]  ? lock_acquire+0x10e/0x300
>  [ 6953.123727]  ip_protocol_deliver_rcu+0x7e/0x330
>  [ 6953.123732]  ip_local_deliver_finish+0x39d/0x5e0
>  [ 6953.123736]  ip_local_deliver+0x156/0x1a0
>  [ 6953.123740]  ip_sublist_rcv_finish+0x8f/0x260
>  [ 6953.123744]  ip_list_rcv_finish+0x45f/0x6a0
>  [ 6953.123749]  ? ip_rcv_finish+0x250/0x250
>  [ 6953.123752]  ? ip_rcv_finish_core+0x1fb0/0x1fb0
>  [ 6953.123756]  ? ip_rcv_core+0x5d8/0xcc0
>  [ 6953.123760]  ip_list_rcv+0x2dc/0x3f0
>  [ 6953.123765]  ? ip_rcv+0xa0/0xa0
>  [ 6953.123768]  ? __lock_acquire+0x834/0x2f60
>  [ 6953.123772]  __netif_receive_skb_list_core+0x479/0x880
>  [ 6953.123777]  ? __netif_receive_skb_core.constprop.0+0x42a0/0x42a0
>  [ 6953.123780]  ? lock_acquire+0x10e/0x300
>  [ 6953.123784]  netif_receive_skb_list_internal+0x671/0xfa0
>  [ 6953.123788]  ? inet_gro_receive+0x737/0xdb0
>  [ 6953.123791]  ? process_backlog+0x1310/0x1310
>  [ 6953.123794]  ? find_held_lock+0x2b/0x80
>  [ 6953.123796]  ? dev_gro_receive+0x11ad/0x3320
>  [ 6953.123799]  ? dev_gro_receive+0x11ad/0x3320
>  [ 6953.123802]  ? dev_gro_receive+0x21c/0x3320
>  [ 6953.123806]  napi_complete_done+0x1a3/0x7b0
>  [ 6953.123809]  ? netif_receive_skb_list+0x380/0x380
>  [ 6953.123813]  gro_cell_poll+0x23a/0x560
>  [ 6953.123818]  __napi_poll.constprop.0+0x9d/0x4e0
>  [ 6953.123821]  net_rx_action+0x489/0xdf0
>  [ 6953.123826]  ? __napi_poll.constprop.0+0x4e0/0x4e0
>  [ 6953.123828]  ? do_raw_spin_trylock+0x150/0x180
>  [ 6953.123832]  ? do_raw_spin_lock+0x129/0x260
>  [ 6953.123835]  ? __rwlock_init+0x150/0x150
>  [ 6953.123840]  handle_softirqs+0x192/0x810
>  [ 6953.123846]  irq_exit_rcu+0x106/0x190
>  [ 6953.123849]  common_interrupt+0x90/0xb0
>  [ 6953.123855]  </IRQ>
>  [ 6953.123856]  <TASK>
>  [ 6953.123858]  asm_common_interrupt+0x22/0x40
>  [ 6953.123861] RIP: 0010:pv_native_safe_halt+0x13/0x20
>  [ 6953.123867] Code: 33 00 00 00 48 2b 05 b4 aa 94 00 c3 cc cc cc cc cc cc cc cc cc cc cc 8b 05 ea 5a 74 02 85 c0 7e 07 0f 00 2d df bf 0e 00 fb f4 <c3> cc cc cc cc cc cc cc cc cc cc cc cc 41 54 55 53 48 89 fb 48 83
>  [ 6953.123870] RSP: 0018:ffffffff84e07e08 EFLAGS: 00000246
>  [ 6953.123874] RAX: 0000000000000000 RBX: ffffffff84e2f540 RCX: ffffffff83d92d2c
>  [ 6953.123876] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff817cf030
>  [ 6953.123878] RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed109a7484fa
>  [ 6953.123880] R10: ffff8884d3a427d3 R11: 0000000000000000 R12: fffffbfff09c5ea8
>  [ 6953.123882] R13: ffffffff85a954a0 R14: 1ffffffff09c0fc7 R15: 0000000000000000
>  [ 6953.123885]  ? ct_kernel_exit.constprop.0+0xac/0xd0
>  [ 6953.123889]  ? do_idle+0x300/0x3d0
>  [ 6953.123894]  default_idle+0x5/0x10
>  [ 6953.123897]  default_idle_call+0x66/0xa0
>  [ 6953.123899]  do_idle+0x300/0x3d0
>  [ 6953.123903]  ? arch_cpu_idle_exit+0x30/0x30
>  [ 6953.123906]  ? __schedule+0xdcc/0x3180
>  [ 6953.123910]  ? do_idle+0x18/0x3d0
>  [ 6953.123913]  cpu_startup_entry+0x50/0x60
>  [ 6953.123917]  rest_init+0x20a/0x210
>  [ 6953.123920]  start_kernel+0x3aa/0x3b0
>  [ 6953.123924]  x86_64_start_reservations+0x20/0x20
>  [ 6953.123928]  x86_64_start_kernel+0x11d/0x120
>  [ 6953.123932]  common_startup_64+0x129/0x138
>  [ 6953.123939]  </TASK>
>
> [2] https://github.com/Mellanox/ovs-tests/blob/master/ipsec-tests/test-ipsec-crypto-vxlan.sh
> [3] https://lore.kernel.org/netdev/20251020161114.1891141-1-edumazet@google.com/

Adding LOCKDEP annotations would be needed (like what we do in
netdev_lockdep_set_classes()

Or I would try something like :

diff --git a/net/core/gro_cells.c b/net/core/gro_cells.c
index fd57b845de333ff0e397eeb95aa67926d4e4a730..2cac53439f62addbd8562a66b28bf01bdbddf02c
100644
--- a/net/core/gro_cells.c
+++ b/net/core/gro_cells.c
@@ -60,9 +60,11 @@ static int gro_cell_poll(struct napi_struct *napi,
int budget)
        struct sk_buff *skb;
        int work_done = 0;

-       __local_lock_nested_bh(&cell->bh_lock);
        while (work_done < budget) {
+               __local_lock_nested_bh(&cell->bh_lock);
                skb = __skb_dequeue(&cell->napi_skbs);
+               __local_unlock_nested_bh(&cell->bh_lock);
+
                if (!skb)
                        break;
                napi_gro_receive(napi, skb);
@@ -71,7 +73,6 @@ static int gro_cell_poll(struct napi_struct *napi, int budget)

        if (work_done < budget)
                napi_complete_done(napi, work_done);
-       __local_unlock_nested_bh(&cell->bh_lock);
        return work_done;
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ