netdev - Re: [Patch net] net: invert the check of detecting hardware RX checksum fault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6fdd8709-6aa5-8562-6c7b-959347f4dcc5@itcare.pl>
Date:   Thu, 22 Nov 2018 02:25:23 +0100
From:   Paweł Staszewski <pstaszewski@...are.pl>
To:     Cong Wang <xiyou.wangcong@...il.com>,
        Herbert Xu <herbert@...dor.apana.org.au>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Tom Herbert <tom@...bertland.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: [Patch net] net: invert the check of detecting hardware RX
 checksum fault


W dniu 16.11.2018 o 21:06, Cong Wang pisze:
> On Thu, Nov 15, 2018 at 8:50 PM Herbert Xu <herbert@...dor.apana.org.au> wrote:
>> On Thu, Nov 15, 2018 at 06:23:38PM -0800, Cong Wang wrote:
>>>> Normally if the hardware's partial checksum is valid then we just
>>>> trust it and send the packet along.  However, if the partial
>>>> checksum is invalid we don't trust it and we will compute the
>>>> whole checksum manually which is what ends up in sum.
>>> Not sure if I understand partial checksum here, but it is the
>>> CHECKSUM_COMPLETE case which I am trying to fix, not
>>> CHECKSUM_PARTIAL.
>> What I meant by partial checksum is the checksum produced by the
>> hardware on RX.  In the kernel we call that CHECKSUM_COMPLETE.
>> CHECKSUM_PARTIAL is the absence of the substantial part of the
>> checksum which is something we use in the kernel primarily for TX.
>>
>> Yes the names are confusing :)
> Yeah, understood. The hardware provides skb->csum in this case, but
> we keep adjusting it each time when we change skb->data.
>
>
>>> So, in other word, a checksum *match* is the intended to detect
>>> this HW RX checksum fault?
>> Correct.  Or more likely it's probably a bug in either the driver
>> or if there are overlaying code such as VLAN then in that code.
>>
>> Basically if the RX checksum is buggy, it's much more likely to
>> cause a valid packet to be rejected than to cause an invalid packet
>> to be accepted, because we still verify that checksum against the
>> pseudoheader.  So we only attempt to catch buggy hardware/drivers
>> by doing a second manual verification for the case where the packet
>> is flagged as invalid.
> Hmm, now I see how it works. Actually it uses the differences between
> these two check's as the difference between hardware checksum with
> skb_checksum().
>
> I will send a patch to add a comment there to avoid confusion.
>
>
>>> Sure, my case is nearly same with Pawel's, except I have no vlan:
>>> https://marc.info/?l=linux-netdev&m=154086647601721&w=2
>> Can you please provide your backtrace?
> I already did:
> https://marc.info/?l=linux-netdev&m=154092211305599&w=2
>
> Note, the offending commit has been backported to 4.14, which
> is why I saw this warning. I have no idea why it is backported
> from the beginning, it is just an optimization, doesn't fix any bug,
> IMHO.
>
> Also, it is much harder for me to reproduce it than Pawel who
> saw the warning every second. Sometimes I need 1 hour to trigger
> it, sometimes other people here needs 10+ hours to trigger it.

By the way - changed network controller for vlans where i was receiving 
rx csum fail to 82599 with ixgbe driver and

with mellanox:

[91584.359273] vlan980: hw csum failure
[91584.359278] CPU: 54 PID: 0 Comm: swapper/54 Not tainted 4.20.0-rc1+ #2
[91584.359279] Call Trace:
[91584.359282]  <IRQ>
[91584.359290]  dump_stack+0x46/0x5b
[91584.359296]  __skb_checksum_complete+0x9b/0xb0
[91584.359301]  icmp_rcv+0x51/0x1f0
[91584.359305]  ip_local_deliver_finish+0x49/0xd0
[91584.359307]  ip_local_deliver+0xb7/0xe0
[91584.359309]  ? ip_sublist_rcv_finish+0x50/0x50
[91584.359310]  ip_rcv+0x96/0xc0
[91584.359313]  __netif_receive_skb_one_core+0x4b/0x70
[91584.359315]  netif_receive_skb_internal+0x2f/0xc0
[91584.359316]  napi_gro_receive+0xb0/0xd0
[91584.359320]  mlx5e_handle_rx_cqe+0x78/0xd0
[91584.359321]  mlx5e_poll_rx_cq+0xc4/0x970
[91584.359323]  mlx5e_napi_poll+0xab/0xcb0
[91584.359325]  net_rx_action+0xd9/0x300
[91584.359328]  __do_softirq+0xd3/0x2d9
[91584.359333]  irq_exit+0x7a/0x80
[91584.359334]  do_IRQ+0x72/0xc0
[91584.359336]  common_interrupt+0xf/0xf
[91584.359337]  </IRQ>
[91584.359340] RIP: 0010:mwait_idle+0x74/0x1b0
[91584.359342] Code: ae f0 31 d2 65 48 8b 04 25 80 4c 01 00 48 89 d1 0f 
01 c8 48 8b 00 48 c1 e8 03 83 e0 01 0f 85 26 01 00 00 48 89 c1 fb 0f 01 
c9 <65> 8b 2d 95 8e 6b 7e 0f 1f 44 00 00 65 48 8b 04 25 80 4c 01 00 f0
[91584.359343] RSP: 0018:ffffc900034f3ec0 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffffde
[91584.359344] RAX: 0000000000000000 RBX: 0000000000000036 RCX: 
0000000000000000
[91584.359345] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[91584.359346] RBP: 0000000000000036 R08: 0000000000000000 R09: 
0000000000000000
[91584.359346] R10: 00000001008b49bb R11: 0000000000000c00 R12: 
0000000000000000
[91584.359347] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[91584.359352]  do_idle+0x19f/0x1c0
[91584.359354]  ? do_idle+0x4/0x1c0
[91584.359355]  cpu_startup_entry+0x14/0x20
[91584.359360]  start_secondary+0x165/0x190
[91584.359364]  secondary_startup_64+0xa4/0xb0


With intel no errors.


>
> Let me see if I can add vlan on my side to make it more reproducible,
> it seems hard as our switch doesn't use vlan either.
>
> We have warnings with conntrack involved too, I can provide it too
> if you are interested.
>
> I tend to revert it for -stable, at least that is what I plan to do
> on my side unless there is a fix coming soon.
>
> Thanks.
>