linux-kernel - Re: [PATCH net-next v4 0/4] net: selftest: improve test string formatting and checksum handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFphGj_57XnwyhW1@pengutronix.de>
Date: Tue, 24 Jun 2025 10:26:02 +0200
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>, kernel@...gutronix.de,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	Maxime Chevallier <maxime.chevallier@...tlin.com>
Subject: Re: [PATCH net-next v4 0/4] net: selftest: improve test string
 formatting and checksum handling

On Mon, Jun 23, 2025 at 10:19:20AM -0700, Jakub Kicinski wrote:
> On Mon, 23 Jun 2025 13:45:41 +0200 Oleksij Rempel wrote:
> > On Sat, Jun 21, 2025 at 06:46:00AM -0700, Jakub Kicinski wrote:
> > > On Fri, 20 Jun 2025 12:53:23 +0200 Oleksij Rempel wrote:
> > > > Let me first describe the setup where this issue was observed and my findings.
> > > > The problem occurs on a system utilizing a Microchip DSA driver with an STMMAC
> > > > Ethernet controller attached to the CPU port.
> > > > 
> > > > In the current selftest implementation, the TCP checksum validation fails,
> > > > while the UDP test passes. The existing code prepares the skb for hardware
> > > > checksum offload by setting skb->ip_summed = CHECKSUM_PARTIAL. For TCP, it sets
> > > > the thdr->check field to the complement of the pseudo-header checksum, and for
> > > > UDP, it uses udp4_hwcsum. If I understand it correct, this configuration tells
> > > > the kernel that the hardware should perform the checksum calculation.
> > > > 
> > > > However, during testing, I noticed that "rx-checksumming" is enabled by default
> > > > on the CPU port, and this leads to the TCP test failure.  Only after disabling
> > > > "rx-checksumming" on the CPU port did the selftest pass. This suggests that the
> > > > issue is specifically related to the hardware checksum offload mechanism in
> > > > this particular setup. The behavior indicates that something on the path
> > > > recalculated the checksum incorrectly.  
> > > 
> > > Interesting, that sounds like the smoking gun. When rx-checksumming 
> > > is enabled the packet still reaches the stack right?  
> > 
> > No. It looks like this packets are just silently dropped, before they was
> > seen by the stack. The only counter which confirms presence of this
> > frames is HW specific mmc_rx_tcp_err. But it will be increasing even if
> > rx-checksumming is disabled and packets are forwarded to the stack.
> 
> If you happen to have the docs for the STMMAC instantiation in the SoC
> it'd be good to check if discarding frames with bad csum can be
> disabled. Various monitoring systems will expect the L4 checksum errors
> to appear in nstat, not some obscure ethtool -S counter.

Ack. I will it add to my todo.


For proper understanding of STMMAC and other drivers, here is how I currently
understand the expected behavior on the receive path, with some open questions:

Receive Path Checksum Scenarios

* No Hardware Verification
    * The hardware is not configured for RX checksum offload
      or does not support the packet type, passing the packet to the driver
      as-is.
    * Expected driver behavior: The driver should set the packet's state to
      `CHECKSUM_NONE`, signaling to the kernel that a software checksum
      validation is required.

* Hardware Verifies and Reports All Frames (Ideal Linux Behavior)
    * The hardware is configured not to drop packets with bad checksums.
      It verifies the checksum of each packet and reports the result (good
      or bad) in a status field on the DMA descriptor.
    * Expected driver behavior: The driver must read the status for every
      packet.
        * If the hardware reports the checksum is good, the driver should set
          the packet's state to `CHECKSUM_UNNECESSARY`.
        * If the hardware reports the checksum is bad, the driver should set
          the packet's state to `CHECKSUM_NONE` and still pass it to the
          kernel.
    * Open Questions:
        * When the hardware reports a bad checksum in this mode, should the
          driver increment `rx_crc_errors` immediately? Or should it only set
          the packet's state to `CHECKSUM_NONE` and let the kernel stack find
          the error and increment the counter, in order to avoid
          double-counting the same error?

* Hardware Verifies and Drops on Error
    * The hardware's RX checksum engine is active and configured to
      automatically discard any packet with an incorrect checksum before it is
      delivered to the driver.
    * Open Questions:

        * When reporting these hardware-level drops, what is the most
          appropriate existing standard `net_device_stats` counter to use
          (e.g., `rx_crc_errors`, `rx_errors`)?
        * If no existing standard counter is a good semantic fit, add new
          standard counters?
        * If the "drop on error" feature cannot be disabled independently,
          and reporting the error via a standard counter is not feasible,
          does this imply that the entire RX checksum offload feature must be
          disabled to ensure error visibility?

* Hardware Provides Full Packet Checksum (`CHECKSUM_COMPLETE`)
    * The hardware calculates a single checksum over the entire packet and
      provides this value to the driver, without needing to parse the
      L3/L4 headers.
    * Expected driver behavior: The driver should place the checksum provided
      by the hardware into the `skb->csum` field and set the packet's state
      to `CHECKSUM_COMPLETE`.

Anything I forgot?

> > > If so does the frame enter the stack with CHECKSUM_COMPLETE or
> > > UNNECESSARY?  
> > 
> > If rx-checksumming is enabled and packet has supported ethertype,
> > then CHECKSUM_UNNECESSARY will be set. Otherwise CHECKSUM_NONE.
> > 
> > > > When examining the loopbacked frames, I observed that the TCP checksum was
> > > > incorrect. Upon further investigation, the xmit helper in net/dsa/tag_ksz.c
> > > > includes the following:
> > > > 
> > > > if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
> > > >     return NULL;
> > > > 
> > > > I assume skb_checksum_help() is intended to calculate the proper checksum when
> > > > CHECKSUM_PARTIAL is set, indicating that the software should complete the
> > > > checksum before handing it to the hardware. My understanding is that the STMMAC
> > > > hardware then calculates the checksum for egress frames if CHECKSUM_PARTIAL is
> > > > used.  
> > > 
> > > stmmac shouldn't touch the frame, note that skb_checksum_help() sets
> > > skb->ip_summed = CHECKSUM_NONE; so the skb should no longer be considered
> > > for csum offload.  
> > 
> > It looks like skb_checksum_help(), which is used in tag_ksz.c, generates
> > a TCP checksum without accounting for the IP pseudo-header. The
> > resulting checksum is then incorrect and is filtered out by the STMMAC
> > HW on ingress
> 
> The pseudo-header csum is filled in net_test_get_skb(), where it calls
> tcp_v4_check(). But I think you're right, it's incorrect. Could you try:
> 
> diff --git a/net/core/selftests.c b/net/core/selftests.c
> index 35f807ea9952..1166dd1ddb07 100644
> --- a/net/core/selftests.c
> +++ b/net/core/selftests.c
> @@ -160,8 +160,10 @@ static struct sk_buff *net_test_get_skb(struct net_device *ndev,
>         skb->csum = 0;
>         skb->ip_summed = CHECKSUM_PARTIAL;
>         if (attr->tcp) {
> -               thdr->check = ~tcp_v4_check(skb->len, ihdr->saddr,
> -                                           ihdr->daddr, 0);
> +               int l4len;
> +
> +               l4len = skb->tail - skb_transport_header(skb);
> +               thdr->check = ~tcp_v4_check(l4len, ihdr->saddr, ihdr->daddr, 0);
>                 skb->csum_start = skb_transport_header(skb) - skb->head;
>                 skb->csum_offset = offsetof(struct tcphdr, check);
>         } else {
> 
> Or some such?

Ah, it works now!

So, for my understanding:
- does skb_checksum_help() rely on a precalculated and integrated
  pseudo-header csum?
- And is this how typical HW-accelerated checksumming works?
- Is this why it is called CHECKSUM_PARTIAL, because only one part of the
  checksum is pre-calculated?

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |