lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Oct 2018 21:41:28 +0200
From:   Andre Tomt <andre@...t.net>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Eric Dumazet <edumazet@...gle.com>
Cc:     Stephen Hemminger <stephen@...workplumber.org>,
        netdev <netdev@...r.kernel.org>, rossi.f@...ind.it,
        Dimitris Michailidis <dmichail@...gle.com>
Subject: Re: Fw: [Bug 201423] New: eth0: hw csum failure

On 21.10.2018 15:34, Andre Tomt wrote:
> On 20.10.2018 00:25, Eric Dumazet wrote:
>> On 10/19/2018 02:58 PM, Eric Dumazet wrote:
>>> On 10/16/2018 06:00 AM, Eric Dumazet wrote:
>>>> On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@...t.net> wrote:
>>>>> I've seen similar on several systems with mlx4 cards when using 
>>>>> 4.18.x -
>>>>> that is hw csum failure followed by some backtrace.
>>>>>
>>>>> Only seems to happen on systems dealing with quite a bit of UDP.
>>>>>
>>>>
>>>> Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
>>>> but CHECKSUM_UNNECESSARY
>>>>
>>>> I would be nice to track this a bit further, maybe by providing the
>>>> full packet content.
>>>>
> <snip>
>>>
>>> As a matter of fact Dimitris found the issue in the patch and is 
>>> working on a fix involving csum_block_sub()
>>>
>>> Problems comes from trimming an odd number of bytes.
>>
>> More exactly, trimming bytes starting at an odd offset.
> 
> No hw csum failures here since I deployed Dimitris fix on top of 4.18.16 
> 32 hours ago.
> 
> Thanks

It eventually showed up again with mlx4, on 4.18.16 + fix and also on 
4.19. I still do not have a useful packet capture.

It is running a torrent client serving up various linux distributions.

> [116116.994519] p0xe0: hw csum failure
> [116116.994550] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.19.0-1 #1
> [116116.994551] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> [116116.994555] Call Trace:
> [116116.994558]  <IRQ>
> [116116.994567]  dump_stack+0x5c/0x7b
> [116116.994574]  __skb_gro_checksum_complete+0x9a/0xa0
> [116116.994580]  udp6_gro_receive+0x211/0x290
> [116116.994585]  ipv6_gro_receive+0x1b1/0x3a0
> [116116.994588]  dev_gro_receive+0x3a0/0x620
> [116116.994590]  ? __build_skb+0x25/0xe0
> [116116.994592]  napi_gro_frags+0xa8/0x220
> [116116.994598]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> [116116.994611]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> [116116.994621]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> [116116.994629]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> [116116.994635]  net_rx_action+0xe0/0x2e0
> [116116.994641]  __do_softirq+0xd8/0x2ff
> [116116.994646]  irq_exit+0xbd/0xd0
> [116116.994650]  do_IRQ+0x85/0xd0
> [116116.994656]  common_interrupt+0xf/0xf
> [116116.994659]  </IRQ>
> [116116.994665] RIP: 0010:cpuidle_enter_state+0xb3/0x310
> [116116.994668] Code: 31 ff e8 e0 e0 bb ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3f 02 00 00 31 ff e8 64 cc c0 ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> [116116.994669] RSP: 0018:ffff924a0635bea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
> [116116.994671] RAX: ffff9016ffb60fc0 RBX: 0000699b9835d616 RCX: 000000000000001f
> [116116.994673] RDX: 0000699b9835d616 RSI: 00000000229837f7 RDI: 0000000000000000
> [116116.994674] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000020840
> [116116.994675] R10: ffff924a0635be88 R11: 0000000000000367 R12: ffff9016ffb69aa8
> [116116.994676] R13: ffffffffa50ac638 R14: 0000000000000000 R15: 0000699b981c63b9
> [116116.994680]  ? cpuidle_enter_state+0x90/0x310
> [116116.994685]  do_idle+0x1d0/0x240
> [116116.994687]  cpu_startup_entry+0x5f/0x70
> [116116.994690]  start_secondary+0x185/0x1a0
> [116116.994693]  secondary_startup_64+0xa4/0xb0
> [116116.994709] p0xe0: hw csum failure
> [116116.994739] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.19.0-1 #1
> [116116.994740] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> [116116.994741] Call Trace:
> [116116.994743]  <IRQ>
> [116116.994746]  dump_stack+0x5c/0x7b
> [116116.994751]  __skb_checksum_complete+0xb8/0xd0
> [116116.994755]  __udp6_lib_rcv+0xa0e/0xa20
> [116116.994764]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> [116116.994768]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> [116116.994771]  ip6_input_finish+0xc0/0x460
> [116116.994774]  ip6_input+0x2b/0x90
> [116116.994776]  ? ip6_make_skb+0x1b0/0x1b0
> [116116.994778]  ipv6_rcv+0x54/0xb0
> [116116.994781]  __netif_receive_skb_one_core+0x42/0x50
> [116116.994784]  netif_receive_skb_internal+0x24/0xb0
> [116116.994786]  napi_gro_frags+0x171/0x220
> [116116.994790]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> [116116.994798]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> [116116.994803]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> [116116.994806]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> [116116.994808]  net_rx_action+0xe0/0x2e0
> [116116.994810]  __do_softirq+0xd8/0x2ff
> [116116.994812]  irq_exit+0xbd/0xd0
> [116116.994814]  do_IRQ+0x85/0xd0
> [116116.994816]  common_interrupt+0xf/0xf
> [116116.994818]  </IRQ>
> [116116.994821] RIP: 0010:cpuidle_enter_state+0xb3/0x310
> [116116.994823] Code: 31 ff e8 e0 e0 bb ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3f 02 00 00 31 ff e8 64 cc c0 ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> [116116.994824] RSP: 0018:ffff924a0635bea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
> [116116.994825] RAX: ffff9016ffb60fc0 RBX: 0000699b9835d616 RCX: 000000000000001f
> [116116.994826] RDX: 0000699b9835d616 RSI: 00000000229837f7 RDI: 0000000000000000
> [116116.994827] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000020840
> [116116.994828] R10: ffff924a0635be88 R11: 0000000000000367 R12: ffff9016ffb69aa8
> [116116.994829] R13: ffffffffa50ac638 R14: 0000000000000000 R15: 0000699b981c63b9
> [116116.994832]  ? cpuidle_enter_state+0x90/0x310
> [116116.994835]  do_idle+0x1d0/0x240
> [116116.994837]  cpu_startup_entry+0x5f/0x70
> [116116.994838]  start_secondary+0x185/0x1a0
> [116116.994840]  secondary_startup_64+0xa4/0xb0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ