netdev - Re: [TEST] tcp_zerocopy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.2303cd61bcc5e@gmail.com>
Date: Tue, 25 Nov 2025 15:44:02 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Neal Cardwell <ncardwell@...gle.com>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, 
 Willem de Bruijn <willemb@...gle.com>, 
 netdev@...r.kernel.org
Subject: Re: [TEST] tcp_zerocopy_maxfrags.pkt fails

Neal Cardwell wrote:
> On Tue, Nov 25, 2025 at 2:49 PM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > Neal Cardwell wrote:
> > > On Mon, Nov 24, 2025 at 11:33 AM Willem de Bruijn
> > > <willemdebruijn.kernel@...il.com> wrote:
> > > >
> > > > Jakub Kicinski wrote:
> > > > > Hi Willem!
> > > > >
> > > > > I migrated netdev CI to our own infra now, and the slightly faster,
> > > > > Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> > > > >
> > > > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > > > # script packet:  1.000237 P. 36:37(1) ack 1
> > > > > # actual packet:  1.000235 P. 36:37(1) ack 1 win 1050
> > > > > # not ok 1 ipv4
> > > > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > > > # script packet:  1.000209 P. 36:37(1) ack 1
> > > > > # actual packet:  1.000208 P. 36:37(1) ack 1 win 1050
> > > > > # not ok 2 ipv6
> > > > > # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> > > > >
> > > > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> > > > >
> > > > > This happens on both debug and non-debug kernel (tho on the former
> > > > > the failure is masked due to MACHINE_SLOW).
> > > >
> > > > That's an odd error.
> > > >
> > > > The test send an msg_iov of 18 1 byte fragments. And verifies that
> > > > only 17 fit in one packet, followed by a single 1 byte packet. The
> > > > test does not explicitly initialize payload, but trusts packetdrill
> > > > to handle that. Relevant snippet below.
> > > >
> > > > Packetdrill complains about payload contents. That error is only
> > > > generated by the below check in run_packet.c. Pretty straightforward.
> > > >
> > > > Packetdrill agrees that the packet is one byte long. The win argument
> > > > is optional on outgoing packets, not relevant to the failure.
> > > >
> > > > So somehow the data in that frag got overwritten in the short window
> > > > between when it was injected into the kernel and when it was observed?
> > > > Seems so unlikely.
> > > >
> > > > Sorry, I'm a bit at a loss at least initially as to the cause.
> > >
> > > I agree this is odd. It looks like either a very concerning kernel
> > > bug, or very concerning packetdrill bug. :-)
> > >
> > > Could someone please run the test with tcpump in the background to
> > > capture the full packet contents, to verify that indeed the packet has
> > > the wrong contents?
> > >
> > > This would help make sure that this is a kernel bug and not a
> > > packetdrill bug. :-)
> >
> > I'm not able to reproduce this on my own machine with the latest nn.
> > But could reproduce it on the netdev machine.
> >
> > I assume all payload is supposed to be zeroed. And indeed the packet
> > seen has a non-zero single byte of payload: 0x60.
> >
> > Is there any chance that this happens on some kernel with
> > unsubmitted patches, but not on netdev-nn/main on this machine either?
> >
> > ----
> >
> > tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect
> > outbound data payload
> > script packet:  1.000169 P. 36:37(1) ack 1
> > actual packet:  1.000167 P. 36:37(1) ack 1 win 1050
> >
> > 14:42:01.330694 tun0  Out IP6 fd3d:a0b:17d6::1.webcache >
> > fd3d:fa7b:d17d::1.50901: Flags [P.], seq 19:36, ack 1, win 1050,
> > length 17: HTTP
> >         0x0000:  6000 842c 0025 0640 fd3d 0a0b 17d6 0000
> >         0x0010:  0000 0000 0000 0001 fd3d fa7b d17d 0000
> >         0x0020:  0000 0000 0000 0001 1f90 c6d5 f7fe 05e9
> >         0x0030:  0000 0001 5018 041a e883 0000 0000 0000
> >         0x0040:  0000 0000 0000 0000 0000 0000 00
> > 14:42:01.330723 tun0  In  IP6 fd3d:fa7b:d17d::1.50901 >
> > fd3d:a0b:17d6::1.webcache: Flags [.], ack 36, win 257, length 0
> >         0x0000:  6000 0000 0014 06ff fd3d fa7b d17d 0000
> >         0x0010:  0000 0000 0000 0001 fd3d 0a0b 17d6 0000
> >         0x0020:  0000 0000 0000 0001 c6d5 1f90 0000 0001
> >         0x0030:  f7fe 05fa 5010 0101 e21b 0000
> > 14:42:01.330727 tun0  Out IP6 fd3d:a0b:17d6::1.webcache >
> > fd3d:fa7b:d17d::1.50901: Flags [P.], seq 36:37, ack 1, win 1050,
> > length 1: HTTP
> >         0x0000:  6000 842c 0015 0640 fd3d 0a0b 17d6 0000
> >         0x0010:  0000 0000 0000 0001 fd3d fa7b d17d 0000
> >         0x0020:  0000 0000 0000 0001 1f90 c6d5 f7fe 05fa
> >         0x0030:  0000 0001 5018 041a e873 0000 60
> 
> Looking at the tests in tools/testing/selftests/net/packetdrill/, I
> don't see anything that sets the --send_omit_free packetdrill flag.
> That flag is needed for TCP zero copy tests, to ensure that
> packetdrill doesn't free the send() buffer after the send() call.
> 
> Because the test didn't use the --send_omit_free flag, packetdrill
> freed the buffer. And the memory probably got reused before the
> transmit. Perhaps for an IPv6 packet, whose first byte is 0x60, and
> thus what was transmitted was the garbage 0x60.
> 
> Does that sound plausible, Willem? If you agree, do you have cycles to
> cook a commit of some kind to fix this?
> 
> One option is to put the  --send_omit_free flag near the top of the
> /tools/testing/selftests/net/packetdrill/tcp_zerocopy_maxfrags.pkt
> script.
> 
> Thanks!
> 
> neal

Thanks Neal!

I verified that that fixed the failure. And that our original Google
internal runner passes that flag on the command line, only for these
zerocopy tests.

I can send a fix.

Only, the ipv4 test appears to be failing with a different error.
Equally surprising. It times out just waiting for the SYNACK.

    ./ksft_runner.sh tcp_zerocopy_maxfrags.pkt
    TAP version 13
    1..2
    tcp_zerocopy_maxfrags.pkt:25: error handling packet: Timed out waiting for packet

Which corresponds with the last line in this snippet.

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 setsockopt(3, SOL_SOCKET, SO_ZEROCOPY, [1], 4) = 0

   // Each pinned zerocopy page is fully accounted to skb->truesize.
   // This test generates a worst case packet with each frag storing
   // one byte, but increasing truesize with a page (64KB on PPC).
   +0 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [2000000], 4) = 0

   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>