netdev - Re: [regression] UDP recv data corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6c6eee2832c658d689895aa9585fd30f54ab3ed9.camel@redhat.com>
Date:   Fri, 02 Jul 2021 16:06:24 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     Matthias Treydte <mt@...dheinz.de>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     David Ahern <dsahern@...il.com>, stable@...r.kernel.org,
        netdev@...r.kernel.org, regressions@...ts.linux.dev,
        davem@...emloft.net, yoshfuji@...ux-ipv6.org, dsahern@...nel.org
Subject: Re: [regression] UDP recv data corruption

Hello,

On Fri, 2021-07-02 at 14:36 +0200, Matthias Treydte wrote:
> And to answer Paolo's questions from his mail to the list (@Paolo: I'm  
> not subscribed, please also send to me directly so I don't miss your mail)

(yup, that is what I did ?!?)

> > Could you please:
> > - tell how frequent is the pkt corruption, even a rough estimate of the
> > frequency.
> 
> # journalctl --since "5min ago" | grep "Packet corrupt" | wc -l
> 167
> 
> So there are 167 detected failures in 5 minutes, while the system is receiving
> at a moderate rate of about 900 pkts/s (according to Prometheus' node exporter
> at least, but seems about right)

Intersting. The relevant UDP GRO features are already off, and this
happens infrequently. Something is happening on a per packet basis, I
can't guess what.

It looks like you should be able to collect more info WRT the packet
corruption enabling debug logging at ffmpeg level, but I guess that
will flood the logfile.

If you have the kernel debuginfo and the 'perf' tool available, could
you please try:

perf probe -a 'udp_gro_receive sk sk->__sk_common.skc_dport'
perf probe -a 'udp_gro_receive_segment'

# neet to wait until at least a pkt corruption happens, 10 second
# should be more then enough
perf record -a -e probe:udp_gro_receive -e probe:udp_gro_receive_segment sleep 10

perf script | gzip > perf_script.gz

and share the above? I fear it could be too big for the ML, feel free
to send it directly to me.

> Next I'll try to capture some broken packets and reply in a separate mail,
> I'll have to figure out a good way to do this first.

Looks like there is corrupted packet every ~2K UDP ones. If you capture
a few thousends consecutive ones, than wireshark should probably help
finding the suspicious ones.

Thanks!

Paolo