lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <859c86f2-5d57-c538-8f61-6f933d239ada@gmx.ch>
Date:   Wed, 31 Jan 2018 21:26:51 +0100
From:   Markus Berner <Markus.Berner@....ch>
To:     steffen.klassert@...unet.com, netdev-list@...oetigt.de
Cc:     netdev@...r.kernel.org
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in
 xfrm_lookup

 > I'm running into a NULL pointer dereference after updating from Linux 
4.1.6 to
 > 4.14.11 (see kernel log below).

We are running into the same problem on our production machine, running 
CoreOS 1576.5.0 Stable with the 4.14.11 kernel on a KVM Cloud VM. It is 
not as easy to reproduce though in our case – we observed a total of 5 
crashes in the last 2 weeks - all except one on the production machine.

 > I still can't reproduce it with my tests. This is probably some race
 > triggered due to your aggressive roadwarrior setup which I don't have.

We have a similar setup to Tobias
- 2 Network Interfaces (KVM/virtio): Public and local VLAN
- Strongswan VPN in Tunnel mode between local VLAN and on-premise 
network, running in a Docker container
- Quite a few iptables NAT and forwarding rules regarding other local 
Docker containers

Some Observations:
- The workaround of locking the IRQs of the Rx/Tx queues of all network 
interfaces to CPU0 Tobias described a while back did not prevent the 
crashes in our case
- The bug does not seem to correlate with load in our case, but load in 
general is quite low.

I am happy to help if I can, but unfortunately our possibilities are a 
bit limited; both due to lack of kernel dev know-how as well as trying 
out changes to configuration on the production machine. I subscribed to 
LKML only now to respond, so I hope the reply works (and to the correct 
message).

Markus

Example Stack Trace below:

[740051.374799] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000020
[740051.379386] IP: xfrm_lookup+0x32/0x8a0
[740051.379941] PGD 80000004648d6067 P4D 80000004648d6067 PUD 461405067 
PMD 0
[740051.380697] Oops: 0000 [#1] SMP PTI
[740051.381060] Modules linked in: iptable_mangle drbg authenc echainiv 
esp4 xfrm6_mode_tunnel xfrm4_mode_tunnel cbc binfmt_misc veth netconsole 
configfs softdog xt_nat nf_log_ipv4 nf_log_common xt_LOG xt_limit 
xt_policy xt_comment xt_multiport ipt_MASQUERADE nf_nat_masquerade_ipv4 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
xt_conntrack nf_nat nf_conntrack libcrc32c crc32c_generic br_netfilter 
bridge stp llc overlay sb_edac edac_core nls_ascii nls_cp437 kvm_intel 
vfat fat kvm mousedev psmouse i2c_piix4 irqbypass evdev virtio_balloon 
i2c_core pvpanic button sch_fq_codel hid_generic usbhid hid ext4 crc16 
mbcache jbd2 fscrypto dm_verity dm_bufio virtio_blk virtio_net uhci_hcd 
ehci_pci ata_piix ehci_hcd crc32c_intel
[740051.389167]  libata virtio_pci usbcore virtio_ring scsi_mod virtio 
usb_common dm_mirror dm_region_hash dm_log dm_mod dax
[740051.391444] CPU: 2 PID: 13516 Comm: java Not tainted 4.14.11-coreos #1
[740051.392792] Hardware name: QEMU CloudSigma, BIOS Bochs 01/01/2011
[740051.394120] task: ffff903022738000 task.stack: ffffa791c7680000
[740051.395399] RIP: 0010:xfrm_lookup+0x32/0x8a0
[740051.396456] RSP: 0018:ffff9030bfc838e8 EFLAGS: 00010246
[740051.397656] RAX: 0000000000000000 RBX: ffff9030bfc83960 RCX: 
0000000000000000
[740051.399526] RDX: ffff9030bfc83960 RSI: 0000000000000000 RDI: 
0000000000000000
[740051.401471] RBP: 0000000000000000 R08: 0000000000000002 R09: 
0000000046d7a6d9
[740051.403260] R10: 00000000ffffffff R11: 0000000062f99322 R12: 
0000000000000002
[740051.405049] R13: ffffffff980e3080 R14: ffff903095c8a0a0 R15: 
ffffffff9812dc20
[740051.406767] FS:  00007f7d68cf6700(0000) GS:ffff9030bfc80000(0000) 
knlGS:0000000000000000
[740051.408900] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[740051.410159] CR2: 0000000000000020 CR3: 00000005fd444005 CR4: 
00000000001606e0
[740051.411805] Call Trace:
[740051.412534]  <IRQ>
[740051.413190]  __xfrm_route_forward+0x61/0x100
[740051.414198]  ip_forward+0x39e/0x470
[740051.415148]  ? ip_rcv_finish+0xa5/0x3f0
[740051.416225]  br_netfilter_enable+0x10c/0x3e0 [br_netfilter]
[740051.417491]  nf_hook_slow+0x39/0xb0
[740051.418530]  ip_rcv+0x303/0x3a0
[740051.419647]  ? inet_del_offload+0x40/0x40
[740051.420303]  __netif_receive_skb_core+0x2c9/0xb60
[740051.420998]  ? x2apic_send_IPI+0x46/0x50
[740051.421648]  ? check_preempt_curr+0x56/0x90
[740051.422309]  ? ttwu_do_wakeup+0x19/0x150
[740051.422920]  ? netif_receive_skb_internal+0x42/0xf0
[740051.423710]  netif_receive_skb_internal+0x42/0xf0
[740051.424717]  br_port_flags_change+0x1d4/0x260 [bridge]
[740051.425884]  ? br_fdb_update+0xc3/0x2c0 [bridge]
[740051.427019]  br_handle_frame_finish+0x1e2/0x510 [bridge]
[740051.428339]  ? lock_timer_base+0x67/0x80
[740051.429036]  ? ipt_do_table+0x35f/0x610
[740051.429906]  ? br_port_flags_change+0x260/0x260 [bridge]
[740051.430683]  br_nf_hook_thresh+0xde/0x12a0 [br_netfilter]
[740051.431465]  ? br_port_flags_change+0x260/0x260 [bridge]
[740051.432217]  br_nf_hook_thresh+0xa8c/0x12a0 [br_netfilter]
[740051.433032]  ? br_port_flags_change+0x260/0x260 [bridge]
[740051.433966]  ? nf_nat_ipv4_in+0x28/0x80 [nf_nat_ipv4]
[740051.434848]  br_nf_hook_thresh+0xe2a/0x12a0 [br_netfilter]
[740051.435639]  ? br_nf_hook_thresh+0x910/0x12a0 [br_netfilter]
[740051.436775]  nf_hook_slow+0x39/0xb0
[740051.437613]  br_handle_frame+0x1f0/0x9b0 [bridge]
[740051.438515]  ? br_port_flags_change+0x260/0x260 [bridge]
[740051.439566]  __netif_receive_skb_core+0x3d2/0xb60
[740051.440477]  ? process_backlog+0x92/0x140
[740051.441324]  process_backlog+0x92/0x140
[740051.442146]  net_rx_action+0x261/0x3a0
[740051.442774]  __do_softirq+0xf7/0x285
[740051.443534]  do_softirq_own_stack+0x2a/0x40
[740051.444395]  </IRQ>
[740051.444986]  do_softirq.part.15+0x3d/0x50
[740051.445823]  __local_bh_enable_ip+0x55/0x60
[740051.446643]  ip_finish_output2+0x18b/0x380
[740051.447496]  ip_output+0x71/0xe0
[740051.448217]  ? ip_fragment.constprop.47+0x80/0x80
[740051.471709]  tcp_transmit_skb+0x524/0x9a0
[740051.472551]  tcp_write_xmit+0x1e7/0xfb0
[740051.473370]  __tcp_push_pending_frames+0x2d/0xd0
[740051.474294]  tcp_sendmsg_locked+0x5ae/0xe10
[740051.475167]  tcp_sendmsg+0x27/0x40
[740051.475905]  sock_sendmsg+0x30/0x40
[740051.476694]  sock_write_iter+0x87/0x100
[740051.477531]  __vfs_write+0xf6/0x150
[740051.478319]  vfs_write+0xb3/0x1a0
[740051.479023]  SyS_write+0x52/0xc0
[740051.479769]  do_syscall_64+0x59/0x1c0
[740051.480559]  entry_SYSCALL64_slow_path+0x25/0x25
[740051.481477] RIP: 0033:0x7f7e15511ca0
[740051.482271] RSP: 002b:00007f7d68cf52e0 EFLAGS: 00000293 ORIG_RAX: 
0000000000000001
[740051.483771] RAX: ffffffffffffffda RBX: 0000000000000115 RCX: 
00007f7e15511ca0
[740051.485197] RDX: 0000000000000006 RSI: 00007f7d32000020 RDI: 
0000000000000115
[740051.486791] RBP: 00007f7d32000020 R08: 0000000000000000 R09: 
00000000d6605690
[740051.492682] R10: 000000000001e5a4 R11: 0000000000000293 R12: 
0000000000000006
[740051.494141] R13: 0000000000000006 R14: 00007f7d68cf5370 R15: 
00007f7e0e029000
[740051.495676] Code: 41 56 41 55 41 54 49 89 fd 55 53 48 89 f5 48 89 d3 
48 89 cf 45 89 c4 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 
31 c0 <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 74 61
[740051.498941] RIP: xfrm_lookup+0x32/0x8a0 RSP: ffff9030bfc838e8
[740051.500030] CR2: 0000000000000020

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ