[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20181203083413.115c9d16@shemminger-XPS-13-9360>
Date: Mon, 3 Dec 2018 08:34:13 -0800
From: Stephen Hemminger <stephen@...workplumber.org>
To: netdev@...r.kernel.org
Subject: Fw: [Bug 201849] New: hw csum failure - reproducible error
More checksum changes fallout?
Begin forwarded message:
Date: Mon, 03 Dec 2018 04:23:36 +0000
From: bugzilla-daemon@...zilla.kernel.org
To: stephen@...workplumber.org
Subject: [Bug 201849] New: hw csum failure - reproducible error
https://bugzilla.kernel.org/show_bug.cgi?id=201849
Bug ID: 201849
Summary: hw csum failure - reproducible error
Product: Networking
Version: 2.5
Kernel Version: 4.19.5-1.el7.elrepo.x86_64
Hardware: Intel
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: IPV4
Assignee: stephen@...workplumber.org
Reporter: jkyriannis@...ernet.org
Regression: No
On prior versions of the 4.19 kernel, including 4.19.5-1.el7.elrepo.x86_64, a
kernel error is generated whenever an IPv4 IGMP v2 query is received. This is
reproducible and has been confirmed on a quiet subnet containing only the host
and a router (manufactured by Juniper). Please see the tcpdump capture of the
packet followed by the immediate kernel error below. Below that, follows some
additional diagnostic output. If any additional information is needed, please
let me know. There are other bugs logged against "hw csum failure" messages,
but non correlate to this stack trace and diagnosis.
---- tcpdump output and kernel error ----
22:53:21.235861 IP 10.10.9.145 > 224.0.0.1: igmp query v2
Dec 2 22:53:21 dtn1 kernel: enp4s0: hw csum failure
Dec 2 22:53:21 dtn1 kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: P
O 4.19.5-1.el7.elrepo.x86_64 #1
Dec 2 22:53:21 dtn1 kernel: Hardware name: Supermicro Super Server/X10SRL-F,
BIOS 3.1 06/06/2018
Dec 2 22:53:21 dtn1 kernel: Call Trace:
Dec 2 22:53:21 dtn1 kernel: <IRQ>
Dec 2 22:53:21 dtn1 kernel: dump_stack+0x63/0x88
Dec 2 22:53:21 dtn1 kernel: netdev_rx_csum_fault+0x3a/0x40
Dec 2 22:53:21 dtn1 kernel: __skb_checksum_complete+0xd5/0xe0
Dec 2 22:53:21 dtn1 kernel: nf_ip_checksum+0xc9/0xf0
Dec 2 22:53:21 dtn1 kernel: nf_send_unreach+0x6a/0xa0 [nf_reject_ipv4]
Dec 2 22:53:21 dtn1 kernel: reject_tg+0x95/0xa0 [ipt_REJECT]
Dec 2 22:53:21 dtn1 kernel: ipt_do_table+0x2e7/0x630 [ip_tables]
Dec 2 22:53:21 dtn1 kernel: ? nf_nat_setup_info+0x93/0x290 [nf_nat]
Dec 2 22:53:21 dtn1 kernel: iptable_filter_hook+0x1f/0x30 [iptable_filter]
Dec 2 22:53:21 dtn1 kernel: nf_hook_slow+0x42/0xc0
Dec 2 22:53:21 dtn1 kernel: ip_local_deliver+0xcd/0xe0
Dec 2 22:53:21 dtn1 kernel: ? ip_sublist_rcv_finish+0x80/0x80
Dec 2 22:53:21 dtn1 kernel: ip_rcv_finish+0x83/0xa0
Dec 2 22:53:21 dtn1 kernel: ip_rcv+0x56/0xd0
Dec 2 22:53:21 dtn1 kernel: ? ip_local_deliver_finish+0x1e0/0x1e0
Dec 2 22:53:21 dtn1 kernel: __netif_receive_skb_one_core+0x57/0x80
Dec 2 22:53:21 dtn1 kernel: __netif_receive_skb+0x18/0x60
Dec 2 22:53:21 dtn1 kernel: netif_receive_skb_internal+0x45/0xf0
Dec 2 22:53:21 dtn1 kernel: napi_gro_frags+0x1a4/0x220
Dec 2 22:53:21 dtn1 kernel: mlx4_en_process_rx_cq+0x765/0xcc0 [mlx4_en]
Dec 2 22:53:21 dtn1 kernel: mlx4_en_poll_rx_cq+0x5f/0x110 [mlx4_en]
Dec 2 22:53:21 dtn1 kernel: net_rx_action+0x289/0x3d0
Dec 2 22:53:21 dtn1 kernel: __do_softirq+0xd1/0x287
Dec 2 22:53:21 dtn1 kernel: irq_exit+0xe8/0x100
Dec 2 22:53:21 dtn1 kernel: do_IRQ+0x59/0xe0
Dec 2 22:53:21 dtn1 kernel: common_interrupt+0xf/0xf
Dec 2 22:53:21 dtn1 kernel: </IRQ>
Dec 2 22:53:21 dtn1 kernel: RIP: 0010:cpu_idle_poll+0x39/0x190
Dec 2 22:53:21 dtn1 kernel: Code: 2d 2f 89 ff 65 44 8b 25 a5 a2 79 7e 0f 1f 44
00 00 fb 66 0f 1f 44 00 00 65 48 8b 1c 25 80 5c 01 00 48 8b 03 a8 08 74 0b eb
1c <f3> 90 48 8b 03 a8 08 75 13 8b 05 a0 11 bf 00 85 c0 75 ed e8 1f db
Dec 2 22:53:21 dtn1 kernel: RSP: 0018:ffffc9000633be88 EFLAGS: 00000202
ORIG_RAX: ffffffffffffffd8
Dec 2 22:53:21 dtn1 kernel: RAX: 0000000000000001 RBX: ffff889038ef1700 RCX:
000000000000001f
Dec 2 22:53:21 dtn1 kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI:
ffff88903fb23b80
Dec 2 22:53:21 dtn1 kernel: RBP: ffffc9000633bea0 R08: 0000000000000002 R09:
ffffa562f1d001dd
Dec 2 22:53:21 dtn1 kernel: R10: 0000000000000004 R11: 0000000000000000 R12:
0000000000000004
Dec 2 22:53:21 dtn1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15:
ffff889038ef1700
Dec 2 22:53:21 dtn1 kernel: ? cpu_idle_poll+0x13/0x190
Dec 2 22:53:21 dtn1 kernel: do_idle+0x61/0x280
Dec 2 22:53:21 dtn1 kernel: cpu_startup_entry+0x73/0x80
Dec 2 22:53:21 dtn1 kernel: start_secondary+0x1ae/0x200
Dec 2 22:53:21 dtn1 kernel: secondary_startup_64+0xa4/0xb0
22:55:26.235749 IP 10.10.9.145 > 224.0.0.1: igmp query v2
Dec 2 22:55:26 dtn1 kernel: enp4s0: hw csum failure
Dec 2 22:55:26 dtn1 kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: P
O 4.19.5-1.el7.elrepo.x86_64 #1
Dec 2 22:55:26 dtn1 kernel: Hardware name: Supermicro Super Server/X10SRL-F,
BIOS 3.1 06/06/2018
Dec 2 22:55:26 dtn1 kernel: Call Trace:
Dec 2 22:55:26 dtn1 kernel: <IRQ>
Dec 2 22:55:26 dtn1 kernel: dump_stack+0x63/0x88
Dec 2 22:55:26 dtn1 kernel: netdev_rx_csum_fault+0x3a/0x40
Dec 2 22:55:26 dtn1 kernel: __skb_checksum_complete+0xd5/0xe0
Dec 2 22:55:26 dtn1 kernel: nf_ip_checksum+0xc9/0xf0
Dec 2 22:55:26 dtn1 kernel: nf_send_unreach+0x6a/0xa0 [nf_reject_ipv4]
Dec 2 22:55:26 dtn1 kernel: reject_tg+0x95/0xa0 [ipt_REJECT]
Dec 2 22:55:26 dtn1 kernel: ipt_do_table+0x2e7/0x630 [ip_tables]
Dec 2 22:55:26 dtn1 kernel: ? nf_nat_setup_info+0x93/0x290 [nf_nat]
Dec 2 22:55:26 dtn1 kernel: iptable_filter_hook+0x1f/0x30 [iptable_filter]
Dec 2 22:55:26 dtn1 kernel: nf_hook_slow+0x42/0xc0
Dec 2 22:55:26 dtn1 kernel: ip_local_deliver+0xcd/0xe0
Dec 2 22:55:26 dtn1 kernel: ? ip_sublist_rcv_finish+0x80/0x80
Dec 2 22:55:26 dtn1 kernel: ip_rcv_finish+0x83/0xa0
Dec 2 22:55:26 dtn1 kernel: ip_rcv+0x56/0xd0
Dec 2 22:55:26 dtn1 kernel: ? ip_local_deliver_finish+0x1e0/0x1e0
Dec 2 22:55:26 dtn1 kernel: __netif_receive_skb_one_core+0x57/0x80
Dec 2 22:55:26 dtn1 kernel: __netif_receive_skb+0x18/0x60
Dec 2 22:55:26 dtn1 kernel: netif_receive_skb_internal+0x45/0xf0
Dec 2 22:55:26 dtn1 kernel: napi_gro_frags+0x1a4/0x220
Dec 2 22:55:26 dtn1 kernel: mlx4_en_process_rx_cq+0x765/0xcc0 [mlx4_en]
Dec 2 22:55:26 dtn1 kernel: mlx4_en_poll_rx_cq+0x5f/0x110 [mlx4_en]
Dec 2 22:55:26 dtn1 kernel: net_rx_action+0x289/0x3d0
Dec 2 22:55:26 dtn1 kernel: __do_softirq+0xd1/0x287
Dec 2 22:55:26 dtn1 kernel: irq_exit+0xe8/0x100
Dec 2 22:55:26 dtn1 kernel: do_IRQ+0x59/0xe0
Dec 2 22:55:26 dtn1 kernel: common_interrupt+0xf/0xf
Dec 2 22:55:26 dtn1 kernel: </IRQ>
Dec 2 22:55:26 dtn1 kernel: RIP: 0010:cpu_idle_poll+0x3e/0x190
Dec 2 22:55:26 dtn1 kernel: Code: 44 8b 25 a5 a2 79 7e 0f 1f 44 00 00 fb 66 0f
1f 44 00 00 65 48 8b 1c 25 80 5c 01 00 48 8b 03 a8 08 74 0b eb 1c f3 90 48 8b
03 <a8> 08 75 13 8b 05 a0 11 bf 00 85 c0 75 ed e8 1f db 8a ff 85 c0 75
Dec 2 22:55:26 dtn1 kernel: RSP: 0018:ffffc9000633be88 EFLAGS: 00000202
ORIG_RAX: ffffffffffffffd8
Dec 2 22:55:26 dtn1 kernel: RAX: 0000000080200000 RBX: ffff889038ef1700 RCX:
000000000000001f
Dec 2 22:55:26 dtn1 kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI:
ffff88903fb23b80
Dec 2 22:55:26 dtn1 kernel: RBP: ffffc9000633bea0 R08: 0000000000000002 R09:
ffffa562f1d001dd
Dec 2 22:55:26 dtn1 kernel: R10: 0000000000000004 R11: 0000000000000008 R12:
0000000000000004
Dec 2 22:55:26 dtn1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15:
ffff889038ef1700
Dec 2 22:55:26 dtn1 kernel: ? cpu_idle_poll+0x13/0x190
Dec 2 22:55:26 dtn1 kernel: do_idle+0x61/0x280
Dec 2 22:55:26 dtn1 kernel: cpu_startup_entry+0x73/0x80
Dec 2 22:55:26 dtn1 kernel: start_secondary+0x1ae/0x200
Dec 2 22:55:26 dtn1 kernel: secondary_startup_64+0xa4/0xb0
-------------
----Mellanox ConnectX-3 Driver information----
# ethtool -i enp4s0
driver: mlx4_en
version: 4.0-0
firmware-version: 2.42.5000
expansion-rom-version:
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
#
--------------
# lspci -v -s 04:00.0
04:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies Device 0064
Physical Slot: 0-1
Flags: bus master, fast devsel, latency 0, IRQ 38, NUMA node 0
Memory at fb200000 (64-bit, non-prefetchable) [size=1M]
Memory at f9800000 (64-bit, prefetchable) [size=8M]
Expansion ROM at fb100000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Capabilities: [48] Vital Product Data
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [148] Device Serial Number 24-8a-07-03-00-c2-59-90
Capabilities: [154] Advanced Error Reporting
Capabilities: [18c] #19
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
#
----------------
# ip maddress show dev enp4s0
4: enp4s0
link 33:33:00:00:00:01
link 01:00:5e:00:00:01
link 33:33:ff:c2:59:90
link 33:33:ff:00:00:03
inet 224.0.0.1
inet6 ff02::1:ff00:3
inet6 ff02::1:ffc2:5990
inet6 ff02::1
inet6 ff01::1
#
----------------
# ifconfig enp4s0
enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.10.9.146 netmask 255.255.255.252 broadcast 10.10.9.147
inet6 2001:DB8:0:926::3 prefixlen 127 scopeid 0x0<global>
inet6 fe80::268a:7ff:fec2:5990 prefixlen 64 scopeid 0x20<link>
ether 24:8a:07:c2:59:90 txqueuelen 10000 (Ethernet)
RX packets 167195079 bytes 1094206597539 (1019.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 123950796 bytes 637967787689 (594.1 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
#
------------------
--
You are receiving this mail because:
You are the assignee for the bug.
Powered by blists - more mailing lists