[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6fdd8709-6aa5-8562-6c7b-959347f4dcc5@itcare.pl>
Date: Thu, 22 Nov 2018 02:25:23 +0100
From: Paweł Staszewski <pstaszewski@...are.pl>
To: Cong Wang <xiyou.wangcong@...il.com>,
Herbert Xu <herbert@...dor.apana.org.au>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
Tom Herbert <tom@...bertland.com>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [Patch net] net: invert the check of detecting hardware RX
checksum fault
W dniu 16.11.2018 o 21:06, Cong Wang pisze:
> On Thu, Nov 15, 2018 at 8:50 PM Herbert Xu <herbert@...dor.apana.org.au> wrote:
>> On Thu, Nov 15, 2018 at 06:23:38PM -0800, Cong Wang wrote:
>>>> Normally if the hardware's partial checksum is valid then we just
>>>> trust it and send the packet along. However, if the partial
>>>> checksum is invalid we don't trust it and we will compute the
>>>> whole checksum manually which is what ends up in sum.
>>> Not sure if I understand partial checksum here, but it is the
>>> CHECKSUM_COMPLETE case which I am trying to fix, not
>>> CHECKSUM_PARTIAL.
>> What I meant by partial checksum is the checksum produced by the
>> hardware on RX. In the kernel we call that CHECKSUM_COMPLETE.
>> CHECKSUM_PARTIAL is the absence of the substantial part of the
>> checksum which is something we use in the kernel primarily for TX.
>>
>> Yes the names are confusing :)
> Yeah, understood. The hardware provides skb->csum in this case, but
> we keep adjusting it each time when we change skb->data.
>
>
>>> So, in other word, a checksum *match* is the intended to detect
>>> this HW RX checksum fault?
>> Correct. Or more likely it's probably a bug in either the driver
>> or if there are overlaying code such as VLAN then in that code.
>>
>> Basically if the RX checksum is buggy, it's much more likely to
>> cause a valid packet to be rejected than to cause an invalid packet
>> to be accepted, because we still verify that checksum against the
>> pseudoheader. So we only attempt to catch buggy hardware/drivers
>> by doing a second manual verification for the case where the packet
>> is flagged as invalid.
> Hmm, now I see how it works. Actually it uses the differences between
> these two check's as the difference between hardware checksum with
> skb_checksum().
>
> I will send a patch to add a comment there to avoid confusion.
>
>
>>> Sure, my case is nearly same with Pawel's, except I have no vlan:
>>> https://marc.info/?l=linux-netdev&m=154086647601721&w=2
>> Can you please provide your backtrace?
> I already did:
> https://marc.info/?l=linux-netdev&m=154092211305599&w=2
>
> Note, the offending commit has been backported to 4.14, which
> is why I saw this warning. I have no idea why it is backported
> from the beginning, it is just an optimization, doesn't fix any bug,
> IMHO.
>
> Also, it is much harder for me to reproduce it than Pawel who
> saw the warning every second. Sometimes I need 1 hour to trigger
> it, sometimes other people here needs 10+ hours to trigger it.
By the way - changed network controller for vlans where i was receiving
rx csum fail to 82599 with ixgbe driver and
with mellanox:
[91584.359273] vlan980: hw csum failure
[91584.359278] CPU: 54 PID: 0 Comm: swapper/54 Not tainted 4.20.0-rc1+ #2
[91584.359279] Call Trace:
[91584.359282] <IRQ>
[91584.359290] dump_stack+0x46/0x5b
[91584.359296] __skb_checksum_complete+0x9b/0xb0
[91584.359301] icmp_rcv+0x51/0x1f0
[91584.359305] ip_local_deliver_finish+0x49/0xd0
[91584.359307] ip_local_deliver+0xb7/0xe0
[91584.359309] ? ip_sublist_rcv_finish+0x50/0x50
[91584.359310] ip_rcv+0x96/0xc0
[91584.359313] __netif_receive_skb_one_core+0x4b/0x70
[91584.359315] netif_receive_skb_internal+0x2f/0xc0
[91584.359316] napi_gro_receive+0xb0/0xd0
[91584.359320] mlx5e_handle_rx_cqe+0x78/0xd0
[91584.359321] mlx5e_poll_rx_cq+0xc4/0x970
[91584.359323] mlx5e_napi_poll+0xab/0xcb0
[91584.359325] net_rx_action+0xd9/0x300
[91584.359328] __do_softirq+0xd3/0x2d9
[91584.359333] irq_exit+0x7a/0x80
[91584.359334] do_IRQ+0x72/0xc0
[91584.359336] common_interrupt+0xf/0xf
[91584.359337] </IRQ>
[91584.359340] RIP: 0010:mwait_idle+0x74/0x1b0
[91584.359342] Code: ae f0 31 d2 65 48 8b 04 25 80 4c 01 00 48 89 d1 0f
01 c8 48 8b 00 48 c1 e8 03 83 e0 01 0f 85 26 01 00 00 48 89 c1 fb 0f 01
c9 <65> 8b 2d 95 8e 6b 7e 0f 1f 44 00 00 65 48 8b 04 25 80 4c 01 00 f0
[91584.359343] RSP: 0018:ffffc900034f3ec0 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffde
[91584.359344] RAX: 0000000000000000 RBX: 0000000000000036 RCX:
0000000000000000
[91584.359345] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[91584.359346] RBP: 0000000000000036 R08: 0000000000000000 R09:
0000000000000000
[91584.359346] R10: 00000001008b49bb R11: 0000000000000c00 R12:
0000000000000000
[91584.359347] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[91584.359352] do_idle+0x19f/0x1c0
[91584.359354] ? do_idle+0x4/0x1c0
[91584.359355] cpu_startup_entry+0x14/0x20
[91584.359360] start_secondary+0x165/0x190
[91584.359364] secondary_startup_64+0xa4/0xb0
With intel no errors.
>
> Let me see if I can add vlan on my side to make it more reproducible,
> it seems hard as our switch doesn't use vlan either.
>
> We have warnings with conntrack involved too, I can provide it too
> if you are interested.
>
> I tend to revert it for -stable, at least that is what I plan to do
> on my side unless there is a fix coming soon.
>
> Thanks.
>
Powered by blists - more mailing lists