lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <68078b1e6d1a5_396ca0294b4@willemb.c.googlers.com.notmuch>
Date: Tue, 22 Apr 2025 08:27:10 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Qiyu Yan <yanqiyu17@...ls.ucas.ac.cn>, 
 Tariq Toukan <tariqt@...dia.com>, 
 Saeed Mahameed <saeedm@...dia.com>, 
 Jakub Kicinski <kuba@...nel.org>, 
 Paolo Abeni <pabeni@...hat.com>, 
 Simon Horman <horms@...nel.org>, 
 Eric Dumazet <edumazet@...gle.com>, 
 "David S. Miller" <davem@...emloft.net>
Cc: netdev@...r.kernel.org
Subject: Re: DNAT'ed traffic from ConnectX-4 card triggers "hw csum failure"
 on veth interface

Qiyu Yan wrote:
> Hi all,
> 
> Apologies for the broad CC—I'm unsure which component is related to the 
> issue, but I've gathered more details since my last report.
> 
> After boot or after resetting the WARN_ONCE flag, I consistently observe 
> the following in `dmesg`:
> 
> eth0: hw csum failure
> skb len=52 headroom=98 headlen=52 tailroom=1578
> mac=(64,14) mac_len=14 net=(78,20) trans=98
> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> csum(0x98009d14 start=40212 offset=38912 ip_summed=2 complete_sw=0 
> valid=0 level=0)
> hash(0x2135374 sw=0 l4=1) proto=0x0800 pkttype=0 iif=2
> priority=0x0 mark=0x0 alloc_cpu=20 vlan_all=0x0
> encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
> dev name=eth0 feat=0x000061164fdd09e9
> skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> skb headroom: 00000030: ba d7 32 44 dd 39 7e b7 bb bd 2e d5 88 e5 2d 00
> skb headroom: 00000040: 9e 52 9b 58 46 89 aa 93 51 02 83 7e 08 00 45 00
> skb headroom: 00000050: 00 48 d3 6d 00 00 3f 11 93 48 0a 00 00 7a 0a 58
> skb headroom: 00000060: 00 1e
> skb linear:   00000000: e2 e4 00 35 00 34 92 e9 f4 39 01 00 00 01 00 00
> skb linear:   00000010: 00 00 00 00 06 72 65 70 6f 72 74 07 6d 65 65 74
> skb linear:   00000020: 69 6e 67 07 74 65 6e 63 65 6e 74 03 63 6f 6d 00
> skb linear:   00000030: 00 01 00 01
> ... large tailroom
> CPU: 20 UID: 0 PID: 0 Comm: swapper/20 Tainted: G OE      
> 6.14.2-300.fc42.x86_64 #1
> Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EPYCD8, 
> BIOS L2.52 11/25/2020
> Call Trace:
>   <IRQ>
>   dump_stack_lvl+0x5d/0x80
>   __skb_checksum_complete+0xeb/0x110
>   ? __pfx_csum_partial_ext+0x10/0x10
>   ? __pfx_csum_block_add_ext+0x10/0x10
>   udp4_csum_init+0x1dc/0x2f0
>   __udp4_lib_rcv+0xc8/0x750
>   ? srso_return_thunk+0x5/0x5f
>   ? raw_v4_input+0x14a/0x270
>   ip_protocol_deliver_rcu+0xcb/0x1a0
>   ip_local_deliver_finish+0x76/0xa0
>   ip_local_deliver+0xfa/0x110
>   __netif_receive_skb_one_core+0x87/0xa0
>   process_backlog+0x87/0x130
>   __napi_poll+0x31/0x1b0
>   ? srso_return_thunk+0x5/0x5f
>   net_rx_action+0x333/0x420
>   handle_softirqs+0xf2/0x340
>   ? srso_return_thunk+0x5/0x5f
>   ? srso_return_thunk+0x5/0x5f
>   __irq_exit_rcu+0xcb/0xf0
>   common_interrupt+0x85/0xa0
>   </IRQ>
>   <TASK>
>   asm_common_interrupt+0x26/0x40
> RIP: 0010:cpuidle_enter_state+0xcc/0x660
> Code: 00 00 e8 67 28 fb fe e8 d2 ed ff ff 49 89 c4 0f 1f 44 00 00 31 ff 
> e8 73 61 f9 fe 45 84 ff 0f 85 02 02 00 00 fb 0f 1f 44 00 00 <85> ed 0f 
> 88 d3 01 00 00 4c 63 f5 49 83 fe 0a 0f 83 9f 04 00 00 49
> RSP: 0018:ffffa79d003afe50 EFLAGS: 00000246
> RAX: ffff96440ca00000 RBX: ffff962542b89800 RCX: 0000000000000000
> RDX: 000051a9557f7bf1 RSI: 000000003152c088 RDI: 0000000000000000
> RBP: 0000000000000002 R08: ffffffee4d207359 R09: ffff96440ca315e0
> R10: 000051bb10ea059b R11: 0000000000000000 R12: 000051a9557f7bf1
> R13: ffffffffa7b15160 R14: 0000000000000002 R15: 0000000000000000
> 
>  From inspecting the SKB, the packet comes from a host (10.0.0.122) 
> connected via a ConnectX-4 Lx NIC to our server. It is DNAT'ed via 
> iptables from 10.0.0.1:53 to a container at 10.88.0.30:53.
> 
> Traffic path:
> 
>      10.0.0.122 --> [CX4 NIC 10.0.0.1/16]
>                        |
>                iptables DNAT (10.0.0.1:53 -> 10.88.0.30:53)
>                        |
>                  [linux bridge (podman0 10.88.0.1/16)]
>                        |
>                    [veth pair]
>                        |
>                  [eth0 inside container]
> 
> The warning is triggered when the packet arrives at eth0 inside the 
> container.
> 
> What's suspicious is the reported checksum info:
> 
>      csum(0x9800a314 start=41748 offset=38912 ip_summed=2 ...)
> 
> Here, start and offset are far beyond the size of the skb. This seems 
> like an invalid buffer?

No, these fields are a union. With CHECKSUM_COMPLETE, you can ignore
those values

        union { 
                __wsum          csum;
                struct {
                        __u16   csum_start;
                        __u16   csum_offset;
                };
        };

> And I suspect that during DNAT and/or forwarding 
> through the bridge and veth, the checksum status is not properly cleared 
> or recalculated.

That sounds most likely. Something in the path pushing or pulling or
modifying a header without updating skb->csum correctly.

You can try capturing the packet earlier in the receive path in the init
namespace. Or capture and log it along more points using bpftrace
instead of tcpdump.

> The NIC is:
> $ ethtool -i mlx-p1
> driver: mlx5_core
> version: 6.14.2-300.fc42.x86_64
> firmware-version: 14.32.1900 (MT_2420110034)
> expansion-rom-version:
> bus-info: 0000:c1:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: yes
> 
> 
> Best,
> Qiyu
> 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ