[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJKeRYZh42MKvqLgLFwCSoti0dbSkreaOMSgmfWXzm-GA@mail.gmail.com>
Date: Tue, 4 Feb 2025 11:35:46 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "David S . Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
Kuniyuki Iwashima <kuniyu@...zon.com>, Simon Horman <horms@...nel.org>, eric.dumazet@...il.com
Subject: Re: [PATCH v2 net 09/16] ipv4: icmp: convert to dev_net_rcu()
On Tue, Feb 4, 2025 at 5:57 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Feb 4, 2025 at 5:14 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Tue, Feb 4, 2025 at 12:36 AM Jakub Kicinski <kuba@...nel.org> wrote:
> > >
> > > On Mon, 3 Feb 2025 14:30:39 +0000 Eric Dumazet wrote:
> > > > @@ -611,9 +611,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
> > > > goto out;
> > > >
> > > > if (rt->dst.dev)
> > > > - net = dev_net(rt->dst.dev);
> > > > + net = dev_net_rcu(rt->dst.dev);
> > > > else if (skb_in->dev)
> > > > - net = dev_net(skb_in->dev);
> > > > + net = dev_net_rcu(skb_in->dev);
> > > > else
> > > > goto out;
> > >
> > > Hm. Weird. NIPA says this one is not under RCU.
> > >
> > > [ 275.730657][ C1] ./include/net/net_namespace.h:404 suspicious rcu_dereference_check() usage!
> > > [ 275.731033][ C1]
> > > [ 275.731033][ C1] other info that might help us debug this:
> > > [ 275.731033][ C1]
> > > [ 275.731471][ C1]
> > > [ 275.731471][ C1] rcu_scheduler_active = 2, debug_locks = 1
> > > [ 275.731799][ C1] 1 lock held by swapper/1/0:
> > > [ 275.732000][ C1] #0: ffffc900001e0ae8 ((&n->timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x230
> > > [ 275.732354][ C1]
> > > [ 275.732354][ C1] stack backtrace:
> > > [ 275.732638][ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.13.0-virtme #1
> > > [ 275.732643][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > [ 275.732646][ C1] Call Trace:
> > > [ 275.732647][ C1] <IRQ>
> > > [ 275.732651][ C1] dump_stack_lvl+0xb0/0xd0
> > > [ 275.732663][ C1] lockdep_rcu_suspicious+0x1ea/0x280
> > > [ 275.732678][ C1] __icmp_send+0xb0d/0x1580
> > > [ 275.732695][ C1] ? tcp_data_queue+0x8/0x22d0
> > > [ 275.732701][ C1] ? lockdep_hardirqs_on_prepare+0x12b/0x410
> > > [ 275.732712][ C1] ? __pfx___icmp_send+0x10/0x10
> > > [ 275.732719][ C1] ? tcp_check_space+0x3ce/0x5f0
> > > [ 275.732742][ C1] ? rcu_read_lock_any_held+0x43/0xb0
> > > [ 275.732750][ C1] ? validate_chain+0x1fe/0xae0
> > > [ 275.732771][ C1] ? __pfx_validate_chain+0x10/0x10
> > > [ 275.732778][ C1] ? hlock_class+0x4e/0x130
> > > [ 275.732784][ C1] ? mark_lock+0x38/0x3e0
> > > [ 275.732788][ C1] ? sock_put+0x1a/0x60
> > > [ 275.732806][ C1] ? __lock_acquire+0xb9a/0x1680
> > > [ 275.732822][ C1] ipv4_send_dest_unreach+0x3b4/0x800
> > > [ 275.732829][ C1] ? neigh_invalidate+0x1c7/0x540
> > > [ 275.732837][ C1] ? __pfx_ipv4_send_dest_unreach+0x10/0x10
> > > [ 275.732850][ C1] ipv4_link_failure+0x1b/0x190
> > > [ 275.732856][ C1] arp_error_report+0x96/0x170
> > > [ 275.732862][ C1] neigh_invalidate+0x209/0x540
> > > [ 275.732873][ C1] neigh_timer_handler+0x87a/0xdf0
> > > [ 275.732883][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> > > [ 275.732886][ C1] call_timer_fn+0x13b/0x230
> > > [ 275.732891][ C1] ? call_timer_fn+0xe8/0x230
> > > [ 275.732894][ C1] ? call_timer_fn+0xe8/0x230
> > > [ 275.732899][ C1] ? __pfx_call_timer_fn+0x10/0x10
> > > [ 275.732902][ C1] ? mark_lock+0x38/0x3e0
> > > [ 275.732920][ C1] __run_timers+0x545/0x810
> > > [ 275.732925][ C1] ? __pfx_neigh_timer_handler+0x10/0x10
> > > [ 275.732936][ C1] ? __pfx___run_timers+0x10/0x10
> > > [ 275.732939][ C1] ? __lock_release+0x103/0x460
> > > [ 275.732947][ C1] ? do_raw_spin_lock+0x131/0x270
> > > [ 275.732952][ C1] ? __pfx_do_raw_spin_lock+0x10/0x10
> > > [ 275.732956][ C1] ? lock_acquire+0x32/0xc0
> > > [ 275.732958][ C1] ? timer_expire_remote+0x96/0xf0
> > > [ 275.732967][ C1] timer_expire_remote+0x9e/0xf0
> > > [ 275.732970][ C1] tmigr_handle_remote_cpu+0x278/0x440
> > > [ 275.732977][ C1] ? __pfx_tmigr_handle_remote_cpu+0x10/0x10
> > > [ 275.732981][ C1] ? __pfx___lock_release+0x10/0x10
> > > [ 275.732985][ C1] ? __pfx_lock_acquire.part.0+0x10/0x10
> > > [ 275.733015][ C1] tmigr_handle_remote_up+0x1a6/0x270
> > > [ 275.733027][ C1] ? __pfx_tmigr_handle_remote_up+0x10/0x10
> > > [ 275.733036][ C1] __walk_groups.isra.0+0x44/0x160
> > > [ 275.733051][ C1] tmigr_handle_remote+0x20b/0x300
> > >
> > > Decoded:
> > > https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/976941/vm-crash-thr0-1
> >
> > Oops, I thought I ran the tests on the whole series. I missed this one.
>
> BTW, ICMPv6 has the same potential problem, I will amend both cases.
I ran again the tests for v3, got an unrelated crash, FYI.
14237.095216] #PF: supervisor instruction fetch in kernel mode
[14237.095570] #PF: error_code(0x0010) - not-present page
[14237.095915] PGD 1e58067 P4D 1e58067 PUD ce1c067 PMD 0
[14237.096991] Oops: Oops: 0010 [#1] SMP DEBUG_PAGEALLOC NOPTI
[14237.097507] CPU: 0 UID: 0 PID: 6371 Comm: python3 Not tainted
6.13.0-virtme #1559
[14237.098045] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[14237.098578] RIP: 0010:0x0
[14237.099324] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[14237.099752] RSP: 0018:ffffacfd4486bed0 EFLAGS: 00000286
[14237.100079] RAX: 0000000000000000 RBX: ffff9af502607200 RCX: 0000000000000002
[14237.100452] RDX: 00007fffc684a690 RSI: 0000000000005401 RDI: ffff9af502607200
[14237.100821] RBP: 0000000000005401 R08: 0000000000000001 R09: 0000000000000000
[14237.101182] R10: 0000000000000001 R11: 0000000000000000 R12: 00007fffc684a690
[14237.101542] R13: ffff9af50888ed68 R14: ffff9af502607200 R15: 0000000000000000
[14237.101956] FS: 00007f76b73f95c0(0000) GS:ffff9af57cc00000(0000)
knlGS:0000000000000000
[14237.102372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14237.102679] CR2: ffffffffffffffd6 CR3: 00000000039ca000 CR4: 00000000000006f0
[14237.103160] Call Trace:
[14237.103435] <TASK>
[14237.103720] ? __die_body.cold+0x19/0x26
[14237.104340] ? page_fault_oops+0x134/0x2a0
[14237.104553] ? cp_new_stat+0x157/0x190
[14237.104799] ? exc_page_fault+0x68/0x230
[14237.105013] ? asm_exc_page_fault+0x26/0x30
[14237.105259] full_proxy_unlocked_ioctl+0x63/0x90
[14237.105546] __x64_sys_ioctl+0x97/0xc0
[14237.105754] do_syscall_64+0x72/0x180
[14237.105949] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Powered by blists - more mailing lists