lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210702143642.Horde.PFbG3LFNTZ3wp0TYiBRGsCM@mail.your-server.de>
Date:   Fri, 02 Jul 2021 14:36:42 +0200
From:   Matthias Treydte <mt@...dheinz.de>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     David Ahern <dsahern@...il.com>, stable@...r.kernel.org,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        regressions@...ts.linux.dev, davem@...emloft.net,
        yoshfuji@...ux-ipv6.org, dsahern@...nel.org
Subject: Re: [regression] UDP recv data corruption


Quoting Willem de Bruijn <willemdebruijn.kernel@...il.com>:

> That library does not enable UDP_GRO. You do not have any UDP based
> tunnel devices (besides vxlan) configured, either, right?

The configuration is really minimal by now, I also took the bonding
out of the equation. We have systemd configure "en*" with mDNS and DHCP
enabled and that's it. The problem remains.

I also found new hardware on my desk today (some Intel SoC), showing
exactly the same symptoms. So it's really nothing to do with the
hardware.

> It is also unlikely that the device has either of NETIF_F_GRO_FRAGLIST
> or NETIF_F_GRO_UDP_FWD configured. This can be checked with `ethtool
> -K $DEV`, shown as "rx-gro-list" and "rx-udp-gro-forwarding",
> respectively.

The full output of "ethtool -k enp5s0" from that SoC:

Features for enp5s0:
rx-checksumming: on
tx-checksumming: on
         tx-checksum-ipv4: off [fixed]
         tx-checksum-ip-generic: on
         tx-checksum-ipv6: off [fixed]
         tx-checksum-fcoe-crc: off [fixed]
         tx-checksum-sctp: on
scatter-gather: on
         tx-scatter-gather: on
         tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
         tx-tcp-segmentation: on
         tx-tcp-ecn-segmentation: off [fixed]
         tx-tcp-mangleid-segmentation: off
         tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]

That's the only NIC on this board:

# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN  
mode DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state  
UP mode DEFAULT group default qlen 1000
     link/ether 00:30:d6:24:99:67 brd ff:ff:ff:ff:ff:ff

> One possible short-term workaround is to disable GRO.

Indeed, "ethtool -K enp5s0 gro off" fixes the problem, and calling it with
"gro on" brings it back.


And to answer Paolo's questions from his mail to the list (@Paolo: I'm  
not subscribed,
please also send to me directly so I don't miss your mail)

> Could you please:
> - tell how frequent is the pkt corruption, even a rough estimate of the
> frequency.

# journalctl --since "5min ago" | grep "Packet corrupt" | wc -l
167

So there are 167 detected failures in 5 minutes, while the system is receiving
at a moderate rate of about 900 pkts/s (according to Prometheus' node exporter
at least, but seems about right)

Next I'll try to capture some broken packets and reply in a separate mail,
I'll have to figure out a good way to do this first.


Thanks for your help,
-Matthias



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ