netdev - Re: bug: tpacket_snd can cause data corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF=yD-+wHzfP6QWJzc=num_VaFvN3RYXV-c3+-VY8EjS87WEiA@mail.gmail.com>
Date:   Wed, 3 Jul 2019 12:07:32 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Frank de Brabander <debrabander@...il.com>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Willem de Bruijn <willemb@...gle.com>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: bug: tpacket_snd can cause data corruption

On Wed, Jul 3, 2019 at 7:08 AM Frank de Brabander <debrabander@...il.com> wrote:
>
> In commit 5cd8d46e a fix was applied for data corruption in
> tpacket_snd. A selftest was added in commit 358be656 which
> validates this fix.
>
> Unfortunately this bug still persists, although since this fix less
> likely to trigger. This bug was initially observed using a PACKET_MMAP
> application, but can also be seen by tweaking the kernel selftest.
>
> By tweaking the selftest txring_overwrite.c to run
> as an infinite loop, the data corruption will still trigger. It
> seems to occur faster by generating interrupts (e.g. by plugging
> in USB devices). Tested with kernel version 5.2-RC7.
>
> Cause for this bug is still unclear.

The cause of the original bug is well understood.

The issue you report I expect is due to background traffic. And more
about the test than the kernel implementation.

Can you reproduce the issue when running the modified test in a
network namespace (./in_netns.sh ./txring_overwrite)?

I observe the issue report outside that, but not inside. That implies
that what we're observing is random background traffic. The modified
test then drops the unexpected packet because it mismatches on length.
As a result the next read (the test always sends two packets, then
reads both) will report a data mismatch. Because it is reading the
first test packet, but expecting the second. Output with a bit more
data:

count: 200
count: 300
count: 400
count: 500
 read: 90B != 100B
wrong pattern: 0x61 != 0x62
count: 600
count: 700
count: 800
 read: 90B != 100B
wrong pattern: 0x61 != 0x62
count: 900
 read: 90B != 100B
wrong pattern: 0x61 != 0x62

Notice the clear pattern.

This does not trigger inside a network namespace, which is how
kselftest invokes txring_override (from run_afpackettests).