lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG76SjY7fnFdgamBELATyO8NGnyNFYiX33SgLE6-q=eoBM8jKg@mail.gmail.com>
Date:   Mon, 29 Oct 2018 23:09:45 -0700
From:   Dimitris Michailidis <dmichail@...gle.com>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     Cong Wang <xiyou.wangcong@...il.com>,
        Paweł Staszewski <pstaszewski@...are.pl>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Latest net-next kernel 4.19.0+

On Mon, Oct 29, 2018 at 8:52 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>
>
> On 10/29/2018 07:53 PM, Eric Dumazet wrote:
>>
>>
>> On 10/29/2018 07:27 PM, Cong Wang wrote:
>>> Hi,
>>>
>>> On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@...are.pl> wrote:
>>>>
>>>> Sorry not complete - followed by hw csum:
>>>>
>>>> [  342.190831] vlan1490: hw csum failure
>>>> [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
>>>> [  342.190836] Call Trace:
>>>> [  342.190839]  <IRQ>
>>>> [  342.190849]  dump_stack+0x46/0x5b
>>>> [  342.190856]  __skb_checksum_complete+0x9a/0xa0
>>>> [  342.190859]  tcp_v4_rcv+0xef/0x960
>>>> [  342.190864]  ip_local_deliver_finish+0x49/0xd0
>>>> [  342.190866]  ip_local_deliver+0x5e/0xe0
>>>> [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
>>>> [  342.190870]  ip_rcv+0x41/0xc0
>>>> [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
>>>> [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
>>>> [  342.190879]  napi_gro_receive+0xb7/0xe0
>>>> [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
>>>> [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
>>>> [  342.190888]  mlx5e_napi_poll+0xab/0xc90
>>>
>>>
>>> We got exactly the same backtrace in our data center. However,
>>> it is not easy for us to reproduce it, do you have any clue to reproduce it?
>>>
>>> If you do, try to tcpdump the packets triggering this warning, it could
>>> be useful for debugging.
>>>
>>> Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
>>> occurs. We tried to revert the offending commit 88078d98d1bb, it
>>> disappears. So it is likely that commit 88078d98d1bb introduces
>>> more troubles than the one fixed by d55bef5059dd057bd.
>>>
>>
>> Or this could be that mlx5 driver is buggy when dealing with VLAN tags.
>>
>> It both uses vlan_tci (hardware vlan offload) in skb _and_ this piece of code in mlx5e_handle_csum()
>>
>>               if (network_depth > ETH_HLEN)
>>                       /* CQE csum is calculated from the IP header and does
>>                        * not cover VLAN headers (if present). This will add
>>                        * the checksum manually.
>>                        */
>>                       skb->csum = csum_partial(skb->data + ETH_HLEN,
>>                                                network_depth - ETH_HLEN,
>>                                                skb->csum);
>>
>>
>> That seems strange to me, because skb_vlan_untag() will not adjust skb->csum in this case.
>>
>
> Bug might be in NETIF_F_RXFCS mlx5 handling btw...
>
> Code does :
>
> if (unlikely(netdev->features & NETIF_F_RXFCS))
>      skb->csum = csum_add(skb->csum,
>                           (__force __wsum)mlx5e_get_fcs(skb));
>
> But Dimitris told us that we need to take into account if FCS starts at odd or even offset.
>
> ->
> if (unlikely(netdev->features & NETIF_F_RXFCS))
>      skb->csum = csum_block_add(skb->csum,
>                                 (__force __wsum)mlx5e_get_fcs(skb),
>                                 skb->len);
>

Indeed this is a bug. I would expect it to produce frequent errors
though as many odd-length
packets would trigger it. Do you have RXFCS? Regardless, how
frequently do you see the problem?

There is some other questionable code in the driver's RXFCS implementation.
Code like

                return *(__be32 *)(skb->data + skb->len - ETH_FCS_LEN);

doesn't work on processors with alignment requirements.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ