lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.1fe4306a89d08@gmail.com>
Date: Mon, 24 Nov 2025 11:29:31 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, 
 Willem de Bruijn <willemb@...gle.com>
Cc: netdev@...r.kernel.org
Subject: Re: [TEST] tcp_zerocopy_maxfrags.pkt fails

Jakub Kicinski wrote:
> Hi Willem!
> 
> I migrated netdev CI to our own infra now, and the slightly faster,
> Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> 
> # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> # script packet:  1.000237 P. 36:37(1) ack 1 
> # actual packet:  1.000235 P. 36:37(1) ack 1 win 1050 
> # not ok 1 ipv4
> # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> # script packet:  1.000209 P. 36:37(1) ack 1 
> # actual packet:  1.000208 P. 36:37(1) ack 1 win 1050 
> # not ok 2 ipv6
> # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> 
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> 
> This happens on both debug and non-debug kernel (tho on the former 
> the failure is masked due to MACHINE_SLOW).

That's an odd error.

The test send an msg_iov of 18 1 byte fragments. And verifies that
only 17 fit in one packet, followed by a single 1 byte packet. The
test does not explicitly initialize payload, but trusts packetdrill
to handle that. Relevant snippet below.

Packetdrill complains about payload contents. That error is only
generated by the below check in run_packet.c. Pretty straightforward.

Packetdrill agrees that the packet is one byte long. The win argument
is optional on outgoing packets, not relevant to the failure.

So somehow the data in that frag got overwritten in the short window
between when it was injected into the kernel and when it was observed?
Seems so unlikely.

Sorry, I'm a bit at a loss at least initially as to the cause.

----

   // send a zerocopy iov of 18 elements:
   +1 sendmsg(4, {msg_name(...)=...,
                  msg_iov(18)=[{..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}],
                  msg_flags=0}, MSG_ZEROCOPY) = 18

   // verify that it is split in one skb of 17 frags + 1 of 1 frag
   // verify that both have the PSH bit set
   +0 > P. 19:36(17) ack 1
   +0 < . 1:1(0) ack 36 win 257

   +0 > P. 36:37(1) ack 1
   +0 < . 1:1(0) ack 37 win 257

----

/* Verify TCP/UDP payload matches expected value. */
static int verify_outbound_live_payload(
        struct packet *actual_packet,
        struct packet *script_packet, char **error)
{
        /* Diff the TCP/UDP data payloads. We've already implicitly
         * checked their length by checking the IP and TCP/UDP headers.
         */
        assert(packet_payload_len(actual_packet) ==
               packet_payload_len(script_packet));
        if (memcmp(packet_payload(script_packet),
                   packet_payload(actual_packet),
                   packet_payload_len(script_packet)) != 0) {
                asprintf(error, "incorrect outbound data payload");
                return STATUS_ERR;
        }
        return STATUS_OK;
}


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ