lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210701124732.Horde.HT4urccbfqv0Nr1Aayuy0BM@mail.your-server.de>
Date:   Thu, 01 Jul 2021 12:47:32 +0200
From:   Matthias Treydte <mt@...dheinz.de>
To:     stable@...r.kernel.org
Cc:     netdev@...r.kernel.org, regressions@...ts.linux.dev,
        davem@...emloft.net, yoshfuji@...ux-ipv6.org, dsahern@...nel.org
Subject: [regression] UDP recv data corruption

Hello,

we recently upgraded the Linux kernel from 5.11.21 to 5.12.12 in our  
video stream receiver appliance and noticed compression artifacts on  
video streams that were previously looking fine. We are receiving UDP  
multicast MPEG TS streams through an FFMpeg / libav layer which does  
the socket and lower level protocol handling. For affected kernels it  
spills the log with messages like

> [mpegts @ 0x7fa130000900] Packet corrupt (stream = 0, dts = 6870802195).
> [mpegts @ 0x7fa11c000900] Packet corrupt (stream = 0, dts = 6870821068).

Bisecting identified commit 18f25dc399901426dff61e676ba603ff52c666f7  
as the one introducing the problem in the mainline kernel. It was  
backported to the 5.12 series in  
450687386cd16d081b58cd7a342acff370a96078. Some random observations  
which may help to understand what's going on:

    * the problem exists in Linux 5.13
    * reverting that commit on top of 5.13 makes the problem go away
    * Linux 5.10.45 is fine
    * no relevant output in dmesg
    * can be reproduced on different hardware (Intel, AMD, different NICs, ...)
    * we do use the bonding driver on the systems (but I did not yet  
verify that this is related)
    * we do not use vxlan (mentioned in the commit message)
    * the relevant code in FFMpeg identifying packet corruption is here:
      https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/mpegts.c#L2758

And the bonding configuration:

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.10.45

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp2s0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: enp2s0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 80:ee:73:XX:XX:XX
Slave queue ID: 0

Slave Interface: enp3s0
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 80:ee:73:XX:XX:XX
Slave queue ID: 0


If there is anything else I can do to help tracking this down please  
let me know.


Regards,
-Matthias Treydte


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ