[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <AM4PR07MB1714B9C0CC4315A62079E26A9A250@AM4PR07MB1714.eurprd07.prod.outlook.com>
Date: Mon, 27 Nov 2017 15:46:22 +0000
From: Jon Maloy <jon.maloy@...csson.com>
To: Tommi Rantala <tommi.t.rantala@...ia.com>,
Ying Xue <ying.xue@...driver.com>,
"David S. Miller" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"tipc-discussion@...ts.sourceforge.net"
<tipc-discussion@...ts.sourceforge.net>,
open list <linux-kernel@...r.kernel.org>
Subject: RE: tipc_node_calculate_timer div-by-zero
Hi Tommy,
Yes, this clearly doesn't look right. tipc_rcv() should never be called unless the bearer exists and is up.
The simplest and safest way to fix this is by just to remove the second call to tipc_rcv(), - there will be more discovery messages arriving later.
An alternative is to move the (user == LINK_CONFIG) test to before the (b && test_bit() test. This is nicer, but will cost another few cpu cycles on the most critical data path. I think I favor the first approach.
Maybe something like this:
if (likely (b && test_bit(0, &b->up))) {
tipc_rcv(sock_net(sk), skb, b);
} else if (msg_user(hdr) == LINK_CONFIG)
tipc_udp_rcast_disc(b, skb);
kfree_skb(skb);
}
rcu_read_unlock();
return 0;
If you want you can send this to 'net' yourself. Otherwise I can do it, but that would take a couple of days.
///jon
> -----Original Message-----
> From: Tommi Rantala [mailto:tommi.t.rantala@...ia.com]
> Sent: Monday, November 27, 2017 04:51
> To: Jon Maloy <jon.maloy@...csson.com>; Ying Xue
> <ying.xue@...driver.com>; David S. Miller <davem@...emloft.net>;
> netdev@...r.kernel.org; tipc-discussion@...ts.sourceforge.net; open list
> <linux-kernel@...r.kernel.org>
> Subject: tipc_node_calculate_timer div-by-zero
>
> Hi,
>
> I'm seeing a rare TIPC div-by-zero crash in tipc_node_calculate_timer().
>
> If I get it right, we're receiving a discovery packet while enabling a bearer.
> The bearer is not yet fully initialized, causing the discovery packet processing
> to use zero tolerance value, which then causes the div-by-zero in
> tipc_node_calculate_timer().
>
> I can reproduce the issue easily by adding some sleep in
> tipc_enable_bearer(), and the machine always crashes after booting it up:
>
> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index
> 47ec121574ce..00bdd87cd614 100644
> --- a/net/tipc/bearer.c
> +++ b/net/tipc/bearer.c
> @@ -327,12 +327,16 @@ static int tipc_enable_bearer(struct net *net, const
> char *name,
> return -EINVAL;
> }
>
> + pr_warn("HACK: sleeping 2 seconds!");
> + usleep_range(2000000, 2100000);
> +
> b->identity = bearer_id;
> b->tolerance = m->tolerance;
> b->window = m->window;
>
>
> In 4.14 removing the latter tipc_rcv() call in tipc_udp_recv() [which was
> added in in commit c9b64d492b1f (tipc: add replicast peer discovery)] seems
> to fix the crash, but I do not really understand the code:
>
> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index
> ecca64fc6a6f..371653acf1f6 100644
> --- a/net/tipc/udp_media.c
> +++ b/net/tipc/udp_media.c
> @@ -371,9 +371,11 @@ static int tipc_udp_recv(struct sock *sk, struct
> sk_buff *skb)
> goto rcu_out;
> }
>
> + /*
> tipc_rcv(sock_net(sk), skb, b);
> rcu_read_unlock();
> return 0;
> + */
>
> rcu_out:
> rcu_read_unlock();
>
>
>
> I have tested 4.4 and 4.14, and the bug is reproducible in both versions.
>
> Comments?
>
> -Tommi
>
>
>
> [ 12.576885] tipc: Activated (version 2.0.0)
> [ 12.577506] NET: Registered protocol family 30
> [ 12.578212] tipc: Started in single node mode
> [ 12.589907] Started in network mode
> [ 12.590450] Own node address <1.1.1>, network identity 1
> [ 12.592022] HACK: sleeping 2 seconds!
> [ 12.668088] divide error: 0000 [#1] SMP
> [ 12.670993] Modules linked in: tipc ip6_udp_tunnel udp_tunnel nfsd
> auth_rpcgss oid_registry nfs_acl nfs lockd grace sunrpc isofs aesni_intel
> aes_x86_64 crypto_simd cryptd glue_helper ata_piix sch_fq_codel autofs4
> [ 12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted
> 4.14.2-pc64-dirty #1
> [ 12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-2.fc27 04/01/2014
> [ 12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
> [ 12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
> [ 12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
> [ 12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX:
> 0000000000000000
> [ 12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI:
> ffff8c2a5b382600
> [ 12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09:
> 0000000000000001
> [ 12.696632] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8c2a5d8949d8
> [ 12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15:
> ffff8c2a5d894800
> [ 12.702338] FS: 0000000000000000(0000) GS:ffff8c2a7fc80000(0000)
> knlGS:0000000000000000
> [ 12.705099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4:
> 00000000003606e0
> [ 12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 12.712627] Call Trace:
> [ 12.713390] <IRQ>
> [ 12.714011] tipc_node_check_dest+0x2e8/0x350 [tipc]
> [ 12.715286] tipc_disc_rcv+0x14d/0x1d0 [tipc]
> [ 12.716370] tipc_rcv+0x8b0/0xd40 [tipc]
> [ 12.717396] ? minmax_running_min+0x2f/0x60
> [ 12.718248] ? dst_alloc+0x4c/0xa0
> [ 12.718964] ? tcp_ack+0xaf1/0x10b0
> [ 12.719658] ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
> [ 12.720634] tipc_udp_recv+0x71/0x1d0 [tipc]
> [ 12.721459] ? dst_alloc+0x4c/0xa0
> [ 12.722130] udp_queue_rcv_skb+0x264/0x490
> [ 12.722924] __udp4_lib_rcv+0x21e/0x990
> [ 12.723670] ? ip_route_input_rcu+0x2dd/0xbf0
> [ 12.724442] ? tcp_v4_rcv+0x958/0xa40
> [ 12.725039] udp_rcv+0x1a/0x20
> [ 12.725587] ip_local_deliver_finish+0x97/0x1d0
> [ 12.726323] ip_local_deliver+0xaf/0xc0
> [ 12.726959] ? ip_route_input_noref+0x19/0x20
> [ 12.727689] ip_rcv_finish+0xdd/0x3b0
> [ 12.728307] ip_rcv+0x2ac/0x360
> [ 12.728839] __netif_receive_skb_core+0x6fb/0xa90
> [ 12.729580] ? udp4_gro_receive+0x1a7/0x2c0
> [ 12.730274] __netif_receive_skb+0x1d/0x60
> [ 12.730953] ? __netif_receive_skb+0x1d/0x60
> [ 12.731637] netif_receive_skb_internal+0x37/0xd0
> [ 12.732371] napi_gro_receive+0xc7/0xf0
> [ 12.732920] receive_buf+0x3c3/0xd40
> [ 12.733441] virtnet_poll+0xb1/0x250
> [ 12.733944] net_rx_action+0x23e/0x370
> [ 12.734476] __do_softirq+0xc5/0x2f8
> [ 12.734922] irq_exit+0xfa/0x100
> [ 12.735315] do_IRQ+0x4f/0xd0
> [ 12.735680] common_interrupt+0xa2/0xa2
> [ 12.736126] </IRQ>
> [ 12.736416] RIP: 0010:native_safe_halt+0x6/0x10
> [ 12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff4d
> [ 12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX:
> 0000000000000000
> [ 12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000
> [ 12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09:
> ffffa41cc12c7e88
> [ 12.740118] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000002
> [ 12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15:
> 0000000000000000
> [ 12.741831] default_idle+0x2a/0x100
> [ 12.742323] arch_cpu_idle+0xf/0x20
> [ 12.742796] default_idle_call+0x28/0x40
> [ 12.743312] do_idle+0x179/0x1f0
> [ 12.743761] cpu_startup_entry+0x1d/0x20
> [ 12.744291] start_secondary+0x112/0x120
> [ 12.744816] secondary_startup_64+0xa5/0xa5
> [ 12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
> 00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
> 89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
> [ 12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
> RSP: ffff8c2a7fc838a0
> [ 12.748555] ---[ end trace 1399ab83390650fd ]---
> [ 12.749296] Kernel panic - not syncing: Fatal exception in interrupt
> [ 12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 12.751215] Rebooting in 60 seconds..
>
>
> [ 12.561265] tipc: Activated (version 2.0.0)
> [ 12.561997] NET: Registered protocol family 30
> [ 12.562686] tipc: Started in single node mode
> [ 12.569696] Started in network mode
> [ 12.570401] Own node address <1.1.1>, network identity 1
> [ 12.592022] HACK: sleeping 2 seconds!
> [ 13.792592] divide error: 0000 [#1] SMP
> [ 13.793539] Modules linked in: tipc ...
> [ 13.799203] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.101-dirty #9
> [ 13.800235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-2.fc27 04/01/2014
> [ 13.801531] task: ffff8802355ebc00 ti: ffff8802355f8000 task.ti:
> ffff8802355f8000
> [ 13.803134] RIP: 0010:[<ffffffffa03b6252>] [<ffffffffa03b6252>]
> tipc_node_calculate_timer.isra.6+0x52/0x60 [tipc]
> [ 13.804974] RSP: 0018:ffff88023fc838a0 EFLAGS: 00010246
> [ 13.805806] RAX: 0000000000000000 RBX: ffff88023fc83996 RCX:
> 0000000000000000
> [ 13.806826] RDX: 0000000000000000 RSI: ffff8802275fd454 RDI:
> 0000000000000000
> [ 13.807890] RBP: ffff88023fc838b0 R08: 0000000000000001 R09:
> 0000000000000001
> [ 13.808915] R10: 0000000000000000 R11: ffff8802275fd470 R12:
> ffff8802275fd460
> [ 13.809865] R13: ffff8802275fd1d8 R14: 0000000000000000 R15:
> ffff8802275fd000
> [ 13.810932] FS: 0000000000000000(0000) GS:ffff88023fc80000(0000)
> knlGS:0000000000000000
> [ 13.812152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 13.813140] CR2: 00007fa9aac6b6e0 CR3: 00000000b2e96000 CR4:
> 00000000003406e0
> [ 13.814257] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 13.815513] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 13.816727] Stack:
> [ 13.817256] ffff88023258b400 0000000000000000 ffff88023fc83960
> ffffffffa03b7102
> [ 13.819080] ffffffff00000000 0000000000000000 ffffea000000edd8
> 0000000001001001
> [ 13.820998] ffff880001001004 ffff880200000002 ffff8800b928b000
> ffff8802275fcc00
> [ 13.822880] Call Trace:
> [ 13.823472] <IRQ>
> [ 13.823869] [<ffffffffa03b7102>] tipc_node_check_dest+0x342/0x3f0 [tipc]
> [ 13.825389] [<ffffffffa03af2db>] tipc_disc_rcv+0x14b/0x1d0 [tipc]
> [ 13.826571] [<ffffffffa03b7ab1>] tipc_rcv+0x6d1/0xc40 [tipc]
> [ 13.827619] [<ffffffff8207dde5>] ? wake_up_process+0x15/0x20
> [ 13.828656] [<ffffffffa0288de0>] ? svc_xprt_do_enqueue+0x100/0x240
> [sunrpc]
> [ 13.829865] [<ffffffffa03be3c0>] ? tipc_udp_send_msg+0x240/0x240 [tipc]
> [ 13.831033] [<ffffffffa03be401>] tipc_udp_recv+0x41/0x80 [tipc]
> [ 13.832126] [<ffffffff82500c77>] udp_queue_rcv_skb+0x327/0x440
> [ 13.833234] [<ffffffff825013c4>] __udp4_lib_rcv+0x154/0x6d0
> [ 13.834290] [<ffffffff824cf321>] ? ip_route_input_noref+0x331/0xe20
> [ 13.835410] [<ffffffff82501e4a>] udp_rcv+0x1a/0x20
Powered by blists - more mailing lists