[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <859c86f2-5d57-c538-8f61-6f933d239ada@gmx.ch>
Date: Wed, 31 Jan 2018 21:26:51 +0100
From: Markus Berner <Markus.Berner@....ch>
To: steffen.klassert@...unet.com, netdev-list@...oetigt.de
Cc: netdev@...r.kernel.org
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in
xfrm_lookup
> I'm running into a NULL pointer dereference after updating from Linux
4.1.6 to
> 4.14.11 (see kernel log below).
We are running into the same problem on our production machine, running
CoreOS 1576.5.0 Stable with the 4.14.11 kernel on a KVM Cloud VM. It is
not as easy to reproduce though in our case – we observed a total of 5
crashes in the last 2 weeks - all except one on the production machine.
> I still can't reproduce it with my tests. This is probably some race
> triggered due to your aggressive roadwarrior setup which I don't have.
We have a similar setup to Tobias
- 2 Network Interfaces (KVM/virtio): Public and local VLAN
- Strongswan VPN in Tunnel mode between local VLAN and on-premise
network, running in a Docker container
- Quite a few iptables NAT and forwarding rules regarding other local
Docker containers
Some Observations:
- The workaround of locking the IRQs of the Rx/Tx queues of all network
interfaces to CPU0 Tobias described a while back did not prevent the
crashes in our case
- The bug does not seem to correlate with load in our case, but load in
general is quite low.
I am happy to help if I can, but unfortunately our possibilities are a
bit limited; both due to lack of kernel dev know-how as well as trying
out changes to configuration on the production machine. I subscribed to
LKML only now to respond, so I hope the reply works (and to the correct
message).
Markus
Example Stack Trace below:
[740051.374799] BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[740051.379386] IP: xfrm_lookup+0x32/0x8a0
[740051.379941] PGD 80000004648d6067 P4D 80000004648d6067 PUD 461405067
PMD 0
[740051.380697] Oops: 0000 [#1] SMP PTI
[740051.381060] Modules linked in: iptable_mangle drbg authenc echainiv
esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel cbc binfmt_misc veth netconsole
configfs softdog xt_nat nf_log_ipv4 nf_log_common xt_LOG xt_limit
xt_policy xt_comment xt_multiport ipt_MASQUERADE nf_nat_masquerade_ipv4
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter
xt_conntrack nf_nat nf_conntrack libcrc32c crc32c_generic br_netfilter
bridge stp llc overlay sb_edac edac_core nls_ascii nls_cp437 kvm_intel
vfat fat kvm mousedev psmouse i2c_piix4 irqbypass evdev virtio_balloon
i2c_core pvpanic button sch_fq_codel hid_generic usbhid hid ext4 crc16
mbcache jbd2 fscrypto dm_verity dm_bufio virtio_blk virtio_net uhci_hcd
ehci_pci ata_piix ehci_hcd crc32c_intel
[740051.389167] libata virtio_pci usbcore virtio_ring scsi_mod virtio
usb_common dm_mirror dm_region_hash dm_log dm_mod dax
[740051.391444] CPU: 2 PID: 13516 Comm: java Not tainted 4.14.11-coreos #1
[740051.392792] Hardware name: QEMU CloudSigma, BIOS Bochs 01/01/2011
[740051.394120] task: ffff903022738000 task.stack: ffffa791c7680000
[740051.395399] RIP: 0010:xfrm_lookup+0x32/0x8a0
[740051.396456] RSP: 0018:ffff9030bfc838e8 EFLAGS: 00010246
[740051.397656] RAX: 0000000000000000 RBX: ffff9030bfc83960 RCX:
0000000000000000
[740051.399526] RDX: ffff9030bfc83960 RSI: 0000000000000000 RDI:
0000000000000000
[740051.401471] RBP: 0000000000000000 R08: 0000000000000002 R09:
0000000046d7a6d9
[740051.403260] R10: 00000000ffffffff R11: 0000000062f99322 R12:
0000000000000002
[740051.405049] R13: ffffffff980e3080 R14: ffff903095c8a0a0 R15:
ffffffff9812dc20
[740051.406767] FS: 00007f7d68cf6700(0000) GS:ffff9030bfc80000(0000)
knlGS:0000000000000000
[740051.408900] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[740051.410159] CR2: 0000000000000020 CR3: 00000005fd444005 CR4:
00000000001606e0
[740051.411805] Call Trace:
[740051.412534] <IRQ>
[740051.413190] __xfrm_route_forward+0x61/0x100
[740051.414198] ip_forward+0x39e/0x470
[740051.415148] ? ip_rcv_finish+0xa5/0x3f0
[740051.416225] br_netfilter_enable+0x10c/0x3e0 [br_netfilter]
[740051.417491] nf_hook_slow+0x39/0xb0
[740051.418530] ip_rcv+0x303/0x3a0
[740051.419647] ? inet_del_offload+0x40/0x40
[740051.420303] __netif_receive_skb_core+0x2c9/0xb60
[740051.420998] ? x2apic_send_IPI+0x46/0x50
[740051.421648] ? check_preempt_curr+0x56/0x90
[740051.422309] ? ttwu_do_wakeup+0x19/0x150
[740051.422920] ? netif_receive_skb_internal+0x42/0xf0
[740051.423710] netif_receive_skb_internal+0x42/0xf0
[740051.424717] br_port_flags_change+0x1d4/0x260 [bridge]
[740051.425884] ? br_fdb_update+0xc3/0x2c0 [bridge]
[740051.427019] br_handle_frame_finish+0x1e2/0x510 [bridge]
[740051.428339] ? lock_timer_base+0x67/0x80
[740051.429036] ? ipt_do_table+0x35f/0x610
[740051.429906] ? br_port_flags_change+0x260/0x260 [bridge]
[740051.430683] br_nf_hook_thresh+0xde/0x12a0 [br_netfilter]
[740051.431465] ? br_port_flags_change+0x260/0x260 [bridge]
[740051.432217] br_nf_hook_thresh+0xa8c/0x12a0 [br_netfilter]
[740051.433032] ? br_port_flags_change+0x260/0x260 [bridge]
[740051.433966] ? nf_nat_ipv4_in+0x28/0x80 [nf_nat_ipv4]
[740051.434848] br_nf_hook_thresh+0xe2a/0x12a0 [br_netfilter]
[740051.435639] ? br_nf_hook_thresh+0x910/0x12a0 [br_netfilter]
[740051.436775] nf_hook_slow+0x39/0xb0
[740051.437613] br_handle_frame+0x1f0/0x9b0 [bridge]
[740051.438515] ? br_port_flags_change+0x260/0x260 [bridge]
[740051.439566] __netif_receive_skb_core+0x3d2/0xb60
[740051.440477] ? process_backlog+0x92/0x140
[740051.441324] process_backlog+0x92/0x140
[740051.442146] net_rx_action+0x261/0x3a0
[740051.442774] __do_softirq+0xf7/0x285
[740051.443534] do_softirq_own_stack+0x2a/0x40
[740051.444395] </IRQ>
[740051.444986] do_softirq.part.15+0x3d/0x50
[740051.445823] __local_bh_enable_ip+0x55/0x60
[740051.446643] ip_finish_output2+0x18b/0x380
[740051.447496] ip_output+0x71/0xe0
[740051.448217] ? ip_fragment.constprop.47+0x80/0x80
[740051.471709] tcp_transmit_skb+0x524/0x9a0
[740051.472551] tcp_write_xmit+0x1e7/0xfb0
[740051.473370] __tcp_push_pending_frames+0x2d/0xd0
[740051.474294] tcp_sendmsg_locked+0x5ae/0xe10
[740051.475167] tcp_sendmsg+0x27/0x40
[740051.475905] sock_sendmsg+0x30/0x40
[740051.476694] sock_write_iter+0x87/0x100
[740051.477531] __vfs_write+0xf6/0x150
[740051.478319] vfs_write+0xb3/0x1a0
[740051.479023] SyS_write+0x52/0xc0
[740051.479769] do_syscall_64+0x59/0x1c0
[740051.480559] entry_SYSCALL64_slow_path+0x25/0x25
[740051.481477] RIP: 0033:0x7f7e15511ca0
[740051.482271] RSP: 002b:00007f7d68cf52e0 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[740051.483771] RAX: ffffffffffffffda RBX: 0000000000000115 RCX:
00007f7e15511ca0
[740051.485197] RDX: 0000000000000006 RSI: 00007f7d32000020 RDI:
0000000000000115
[740051.486791] RBP: 00007f7d32000020 R08: 0000000000000000 R09:
00000000d6605690
[740051.492682] R10: 000000000001e5a4 R11: 0000000000000293 R12:
0000000000000006
[740051.494141] R13: 0000000000000006 R14: 00007f7d68cf5370 R15:
00007f7e0e029000
[740051.495676] Code: 41 56 41 55 41 54 49 89 fd 55 53 48 89 f5 48 89 d3
48 89 cf 45 89 c4 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38
31 c0 <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 74 61
[740051.498941] RIP: xfrm_lookup+0x32/0x8a0 RSP: ffff9030bfc838e8
[740051.500030] CR2: 0000000000000020
Powered by blists - more mailing lists