lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpU9TqPp4SKLa0Z=kbnaJzd=PsgcFKMX25W9YYrJp2658g@mail.gmail.com>
Date:   Wed, 30 Jan 2019 09:15:26 -0800
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Ivan Babrou <ivan@...udflare.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Ignat Korchagin <ignat@...udflare.com>,
        Shawn Bohrer <sbohrer@...udflare.com>,
        Jakub Sitnicki <jakub@...udflare.com>
Subject: Re: Crashes in skb clone/allocation in 4.19.18

On Wed, Jan 30, 2019 at 8:54 AM Ivan Babrou <ivan@...udflare.com> wrote:
>
> Hey,
>
> We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
> crashed with the following:
>
> [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
> [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
> O      4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0

This looks more like an mm bug than a networking one.

Also, it is always helpful if you can map the RIP to source code,
using scripts/faddr2line or scripts/decode_stacktrace.sh.


Thanks.


> [ 2313.257768] Code: 89 fa 4c 89 f6 e8 68 40 a1 00 4c 8b 55 00 58 4d
> 85 d2 75 d6 e9 6f ff ff ff 41 8b 59 20 48 8d 4a 01 4c 89 f8 49 8b 39
> 4c 01 fb <48> 33 1b 49 33 99 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 0f 84
> [ 2313.295550] RSP: 0000:ffff94457f903b48 EFLAGS: 00010202
> [ 2313.310352] RAX: 08b82daf1f57da0e RBX: 08b82daf1f57da0e RCX: 00000000005ff72d
> [ 2313.327189] RDX: 00000000005ff72c RSI: 0000000000480220 RDI: 0000000000026e40
> [ 2313.344029] RBP: ffff94457f04d680 R08: ffff94457f926e40 R09: ffff94457f04d680
> [ 2313.360912] R10: 000004ce652a0026 R11: 0000000000000000 R12: 0000000000480220
> [ 2313.377857] R13: 00000000ffffffff R14: ffffffffb1ab3ab7 R15: 08b82daf1f57da0e
> [ 2313.394820] FS:  00007fdea755c780(0000) GS:ffff94457f900000(0000)
> knlGS:0000000000000000
> [ 2313.412887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2313.428581] CR2: 000055acc3cf517b CR3: 000000201b1ea003 CR4: 00000000003606e0
> [ 2313.445753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2313.462843] perf: interrupt took too long (8028 > 7291), lowering
> kernel.perf_event_max_sample_rate to 24000
> [ 2313.462867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2313.500216] Call Trace:
> [ 2313.512833]  <IRQ>
> [ 2313.524748]  __alloc_skb+0x57/0x1d0
> [ 2313.537934]  __tcp_send_ack.part.48+0x2f/0x100
> [ 2313.551845]  tcp_rcv_established+0x550/0x640
> [ 2313.565394]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 2313.578322]  tcp_v4_rcv+0xadc/0xbd0
> [ 2313.590993]  ip_local_deliver_finish+0x5d/0x1d0
> [ 2313.604727]  ip_local_deliver+0x6b/0xe0
> [ 2313.617782]  ? ip_sublist_rcv+0x200/0x200
> [ 2313.630415] perf: interrupt took too long (10040 > 10035), lowering
> kernel.perf_event_max_sample_rate to 19000
> [ 2313.630948]  ip_rcv+0x52/0xd0
> [ 2313.662850]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 2313.662857]  __netif_receive_skb_one_core+0x52/0x70
> [ 2313.690860]  netif_receive_skb_internal+0x34/0xe0
> [ 2313.690883]  efx_rx_deliver+0x11a/0x180 [sfc]
> [ 2313.717780]  ? __efx_rx_packet+0x1ef/0x730 [sfc]
> [ 2313.717786]  ? __queue_work+0x103/0x3e0
> [ 2313.743118]  ? efx_poll+0x35e/0x460 [sfc]
> [ 2313.743125]  ? net_rx_action+0x138/0x360
> [ 2313.767356]  ? __do_softirq+0xd8/0x2d2
> [ 2313.767362]  ? irq_exit+0xb4/0xc0
> [ 2313.790680]  ? do_IRQ+0x85/0xd0
> [ 2313.790688]  ? common_interrupt+0xf/0xf
> [ 2313.790694]  </IRQ>
> [ 2313.823837] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw xt_nat iptable_nat
> nf_nat_ipv4 nf_nat xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark
> iptable_mangle xt_owner xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6
> iptable_raw ip6table_filter ip6_tables nfnetlink_log xt_NFLOG
> xt_tcpudp xt_comment xt_conntrack nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 xt_mark xt_multiport xt_set iptable_filter bpfilter
> ip_set_hash_netport ip_set_hash_net ip_set_hash_ip ip_set nfnetlink
> 8021q garp mrp stp llc sb_edac x86_pkg_temp_thermal kvm_intel kvm
> irqbypass crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64
> ipmi_ssif crypto_simd cryptd
> [ 2313.952153]  sfc(O) glue_helper igb i2c_algo_bit ipmi_si mdio dca
> ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
> [ 2313.952238] ---[ end trace 477d8e3081c605f6 ]---
>
> Some nodes also crashed in skb_clone, rather than __alloc_skb:
>
> [ 3810.686137] general protection fault: 0000 [#1] SMP PTI
> [ 3810.694579] CPU: 64 PID: 69338 Comm: nginx-fl Not tainted
> 4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 3810.706589] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
> T42S-2U(LBG-4) ^S5SZ090028/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10
> 06/29/2018
> [ 3810.726475] RIP: 0010:kmem_cache_alloc+0x89/0x1c0
> [ 3810.734701] Code: 82 72 49 83 78 10 00 4d 8b 30 0f 84 0e 01 00 00
> 4d 85 f6 0f 84 05 01 00 00 41 8b 5f 20 48 8d 4a 01 4c 89 f0 49 8b 3f
> 4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 74 b2
> [ 3810.761088] RSP: 0000:ffff99723fe03730 EFLAGS: 00010282
> [ 3810.770132] RAX: f0382d8aebf1ae68 RBX: f0382d8aebf1ae68 RCX: 0000000001cb61cf
> [ 3810.781105] RDX: 0000000001cb61ce RSI: 0000000000480020 RDI: 0000000000027550
> [ 3810.792012] RBP: ffff99723f19d500 R08: ffff99723fe27550 R09: 00000000000005dc
> [ 3810.802820] R10: ffff9992227c0000 R11: 0000000000004000 R12: 0000000000480020
> [ 3810.813589] R13: ffffffff8dcb5f7d R14: f0382d8aebf1ae68 R15: ffff99723f19d500
> [ 3810.824382] FS:  00007f2a8863c780(0000) GS:ffff99723fe00000(0000)
> knlGS:0000000000000000
> [ 3810.836189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3810.845662] CR2: 000055820762eecd CR3: 00000019eb850003 CR4: 00000000007606e0
> [ 3810.856567] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3810.867600] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3810.878554] PKRU: 55555554
> [ 3810.884787] Call Trace:
> [ 3810.890601]  <IRQ>
> [ 3810.896116]  skb_clone+0x4d/0xb0
> [ 3810.902712]  dev_queue_xmit_nit+0xd9/0x260
> [ 3810.910181]  dev_hard_start_xmit+0x69/0x1f0
> [ 3810.917784]  __dev_queue_xmit+0x6f7/0x8a0
> [ 3810.925172]  ? eth_header+0x26/0xc0
> [ 3810.932053]  ip_finish_output2+0x193/0x400
> [ 3810.939670]  ? ip_finish_output+0x139/0x270
> [ 3810.947241]  ip_output+0x6c/0xe0
> [ 3810.953844]  ? ip_append_data.part.51+0xc0/0xc0
> [ 3810.961802]  __tcp_transmit_skb+0x511/0xaa0
> [ 3810.969420]  __tcp_retransmit_skb+0x19c/0x7c0
> [ 3810.977209]  ? tcp_current_mss+0x57/0xa0
> [ 3810.984493]  tcp_retransmit_skb+0x12/0x80
> [ 3810.991894]  tcp_xmit_retransmit_queue.part.50+0x147/0x240
> [ 3811.000754]  tcp_ack+0x9c4/0x11b0
> [ 3811.007416]  tcp_rcv_established+0x190/0x640
> [ 3811.015065]  ? tcp_v4_inbound_md5_hash+0x69/0x160
> [ 3811.023106]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 3811.030190]  tcp_v4_rcv+0xadc/0xbd0
> [ 3811.037009]  ip_local_deliver_finish+0x5d/0x1d0
> [ 3811.044859]  ip_local_deliver+0x6b/0xe0
> [ 3811.051999]  ? ip_sublist_rcv+0x200/0x200
> [ 3811.059325]  ip_rcv+0x52/0xd0
> [ 3811.065595]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 3811.073361]  __netif_receive_skb_one_core+0x52/0x70
> [ 3811.081621]  netif_receive_skb_internal+0x34/0xe0
> [ 3811.089652]  napi_gro_receive+0xba/0xe0
> [ 3811.096969]  mlx5e_handle_rx_cqe+0x1eb/0x530 [mlx5_core]
> [ 3811.105545]  ? skb_release_head_state+0x5c/0xb0
> [ 3811.113447]  mlx5e_poll_rx_cq+0xc8/0x910 [mlx5_core]
> [ 3811.121652]  mlx5e_napi_poll+0xb1/0xc60 [mlx5_core]
> [ 3811.129574]  net_rx_action+0x138/0x360
> [ 3811.136266]  __do_softirq+0xd8/0x2d2
> [ 3811.142679]  irq_exit+0xb4/0xc0
> [ 3811.148578]  do_IRQ+0x85/0xd0
> [ 3811.154254]  common_interrupt+0xf/0xf
> [ 3811.160585]  </IRQ>
> [ 3811.165319] RIP: 0033:0x5581e1551ca0
> [ 3811.171546] Code: e8 10 41 ff 24 ee 81 7c ca 04 ff ff fe ff 0f 83
> 87 1c 00 00 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff 24 ee 48
> 8b 2c c2 <48> 89 2c ca 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff
> 24 ee
> [ 3811.195925] RSP: 002b:00007ffdd615ebc0 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffde
> [ 3811.206319] RAX: 0000000000000000 RBX: 00000000406c9058 RCX: 000000000000000b
> [ 3811.216321] RDX: 000000004099cdc8 RSI: fffffffb40c07eb0 RDI: 000000004183d738
> [ 3811.226277] RBP: fffffff444c8c5c0 R08: 000000004099cdc8 R09: 00000000425ce3d8
> [ 3811.236340] R10: 0000000044c8c5c0 R11: 000000004139cbb0 R12: 0000000000000000
> [ 3811.246349] R13: 00005581ead6a9e0 R14: 000000004166afe8 R15: 00000000406c90f8
> [ 3811.256320] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
> x86_pkg_temp_thermal kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul
> crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd mlx5_core
> [ 3811.351698]  cryptd xhci_pci tpm_crb mlxfw glue_helper ioatdma
> devlink ipmi_si xhci_hcd dca ipmi_devintf ipmi_msghandler tpm_tis
> tpm_tis_core tpm efivarfs ip_tables x_tables
> [ 3811.375161] ---[ end trace 1a7795bb39a63cf7 ]---
>
> Is this know? Could it be related to this commit:
>
> * https://github.com/torvalds/linux/commit/598e57e029290be3e7f8f87ff908091a5a22ed2f
>
> Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ