lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 16 Oct 2018 06:00:48 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     andre@...t.net
Cc:     Stephen Hemminger <stephen@...workplumber.org>,
        netdev <netdev@...r.kernel.org>, rossi.f@...ind.it
Subject: Re: Fw: [Bug 201423] New: eth0: hw csum failure

On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt <andre@...t.net> wrote:
>
> On 15.10.2018 17:41, Eric Dumazet wrote:
> > On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> >> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> >> got the following first bad commit:
> >>
> >> commit 88078d98d1bb085d72af8437707279e203524fa5
> >> Author: Eric Dumazet <edumazet@...gle.com>
> >> Date:   Wed Apr 18 11:43:15 2018 -0700
> >>
> >>      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >>
> >>      After working on IP defragmentation lately, I found that some large
> >>      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >>      zero paddings on the last (small) fragment.
> >>
> >>      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >>      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >>      fragments had CHECKSUM_COMPLETE set.
> >>
> >>      We can instead compute the checksum of the part we are trimming,
> >>      usually smaller than the part we keep.
> >>
> >>      Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> >>      Signed-off-by: David S. Miller <davem@...emloft.net>
> >>
> >
> > Thanks for bisecting !
> >
> > This commit is known to expose some NIC/driver bugs.
> >
> > Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> > ("net: sungem: fix rx checksum support")  for one driver needing a fix.
> >
> > I assume SKY2_HW_NEW_LE is not set on your NIC ?
> >
>
> I've seen similar on several systems with mlx4 cards when using 4.18.x -
> that is hw csum failure followed by some backtrace.
>
> Only seems to happen on systems dealing with quite a bit of UDP.
>

Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
but CHECKSUM_UNNECESSARY

I would be nice to track this a bit further, maybe by providing the
full packet content.

> Example from 4.18.10:
> > [635607.740574] p0xe0: hw csum failure
> > [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> > [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> > [635607.740599] Call Trace:
> > [635607.740602]  <IRQ>
> > [635607.740611]  dump_stack+0x5c/0x7b
> > [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
> > [635607.740621]  udp6_gro_receive+0x211/0x290
> > [635607.740624]  ipv6_gro_receive+0x1a8/0x390
> > [635607.740627]  dev_gro_receive+0x33e/0x550
> > [635607.740628]  napi_gro_frags+0xa2/0x210
> > [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> > [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> > [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> > [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> > [635607.740658]  net_rx_action+0xe0/0x2e0
> > [635607.740662]  __do_softirq+0xd8/0x2e5
> > [635607.740666]  irq_exit+0xb4/0xc0
> > [635607.740667]  do_IRQ+0x85/0xd0
> > [635607.740670]  common_interrupt+0xf/0xf
> > [635607.740671]  </IRQ>
> > [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> > [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> > [635607.740701] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> > [635607.740703] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> > [635607.740703] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> > [635607.740704] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> > [635607.740705] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> > [635607.740706] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> > [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
> > [635607.740712]  do_idle+0x1d0/0x240
> > [635607.740715]  cpu_startup_entry+0x5f/0x70
> > [635607.740719]  start_secondary+0x185/0x1a0
> > [635607.740722]  secondary_startup_64+0xa5/0xb0
> > [635607.740731] p0xe0: hw csum failure
> > [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> > [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
> > [635607.740746] Call Trace:
> > [635607.740747]  <IRQ>
> > [635607.740750]  dump_stack+0x5c/0x7b
> > [635607.740755]  __skb_checksum_complete+0xb8/0xd0
> > [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
> > [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> > [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> > [635607.740774]  ip6_input_finish+0xc0/0x460
> > [635607.740776]  ip6_input+0x2b/0x90
> > [635607.740778]  ? ip6_rcv_finish+0x110/0x110
> > [635607.740780]  ipv6_rcv+0x2cd/0x4b0
> > [635607.740783]  ? udp6_lib_lookup_skb+0x59/0x80
> > [635607.740785]  __netif_receive_skb_core+0x455/0xb30
> > [635607.740788]  ? ipv6_gro_receive+0x1a8/0x390
> > [635607.740790]  ? netif_receive_skb_internal+0x24/0xb0
> > [635607.740792]  netif_receive_skb_internal+0x24/0xb0
> > [635607.740793]  napi_gro_frags+0x165/0x210
> > [635607.740796]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> > [635607.740802]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> > [635607.740807]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> > [635607.740810]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> > [635607.740811]  net_rx_action+0xe0/0x2e0
> > [635607.740813]  __do_softirq+0xd8/0x2e5
> > [635607.740816]  irq_exit+0xb4/0xc0
> > [635607.740817]  do_IRQ+0x85/0xd0
> > [635607.740820]  common_interrupt+0xf/0xf
> > [635607.740821]  </IRQ>
> > [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> > [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> > [635607.740848] RSP: 0018:ffffa5c206353ea8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
> > [635607.740849] RAX: ffff8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 000000000000001f
> > [635607.740850] RDX: 00024214f597c5b0 RSI: 0000000000020780 RDI: 0000000000000000
> > [635607.740851] RBP: 0000000000000004 R08: 002542bfbefa99fa R09: 00000000ffffffff
> > [635607.740852] R10: ffffa5c206353e88 R11: 00000000000000c5 R12: ffffffffaf0aaf78
> > [635607.740853] R13: ffff8d72ffd297d8 R14: 0000000000000000 R15: 00024214f58c2ed5
> > [635607.740855]  ? cpuidle_enter_state+0x91/0x2a0
> > [635607.740857]  do_idle+0x1d0/0x240
> > [635607.740859]  cpu_startup_entry+0x5f/0x70
> > [635607.740861]  start_secondary+0x185/0x1a0
> > [635607.740863]  secondary_startup_64+0xa5/0xb0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ