lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 29 Oct 2018 19:53:18 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Cong Wang <xiyou.wangcong@...il.com>,
        Paweł Staszewski <pstaszewski@...are.pl>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Latest net-next kernel 4.19.0+



On 10/29/2018 07:27 PM, Cong Wang wrote:
> Hi,
> 
> On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@...are.pl> wrote:
>>
>> Sorry not complete - followed by hw csum:
>>
>> [  342.190831] vlan1490: hw csum failure
>> [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
>> [  342.190836] Call Trace:
>> [  342.190839]  <IRQ>
>> [  342.190849]  dump_stack+0x46/0x5b
>> [  342.190856]  __skb_checksum_complete+0x9a/0xa0
>> [  342.190859]  tcp_v4_rcv+0xef/0x960
>> [  342.190864]  ip_local_deliver_finish+0x49/0xd0
>> [  342.190866]  ip_local_deliver+0x5e/0xe0
>> [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
>> [  342.190870]  ip_rcv+0x41/0xc0
>> [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
>> [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
>> [  342.190879]  napi_gro_receive+0xb7/0xe0
>> [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
>> [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
>> [  342.190888]  mlx5e_napi_poll+0xab/0xc90
> 
> 
> We got exactly the same backtrace in our data center. However,
> it is not easy for us to reproduce it, do you have any clue to reproduce it?
> 
> If you do, try to tcpdump the packets triggering this warning, it could
> be useful for debugging.
> 
> Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
> occurs. We tried to revert the offending commit 88078d98d1bb, it
> disappears. So it is likely that commit 88078d98d1bb introduces
> more troubles than the one fixed by d55bef5059dd057bd.
> 

Or this could be that mlx5 driver is buggy when dealing with VLAN tags.

It both uses vlan_tci (hardware vlan offload) in skb _and_ this piece of code in mlx5e_handle_csum() 

		if (network_depth > ETH_HLEN)
			/* CQE csum is calculated from the IP header and does
			 * not cover VLAN headers (if present). This will add
			 * the checksum manually.
			 */
			skb->csum = csum_partial(skb->data + ETH_HLEN,
						 network_depth - ETH_HLEN,
						 skb->csum);


That seems strange to me, because skb_vlan_untag() will not adjust skb->csum in this case.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ