lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250624090953.1b6d28e6@kernel.org>
Date: Tue, 24 Jun 2025 09:09:53 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Oleksij Rempel <o.rempel@...gutronix.de>
Cc: "David S. Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
 <horms@...nel.org>, kernel@...gutronix.de, linux-kernel@...r.kernel.org,
 netdev@...r.kernel.org, Maxime Chevallier <maxime.chevallier@...tlin.com>
Subject: Re: [PATCH net-next v4 0/4] net: selftest: improve test string
 formatting and checksum handling

On Tue, 24 Jun 2025 10:26:02 +0200 Oleksij Rempel wrote:
> On Mon, Jun 23, 2025 at 10:19:20AM -0700, Jakub Kicinski wrote:
> > On Mon, 23 Jun 2025 13:45:41 +0200 Oleksij Rempel wrote:  
> > > On Sat, Jun 21, 2025 at 06:46:00AM -0700, Jakub Kicinski wrote:  
> > > > On Fri, 20 Jun 2025 12:53:23 +0200 Oleksij Rempel wrote:  
> > > > > Let me first describe the setup where this issue was observed and my findings.
> > > > > The problem occurs on a system utilizing a Microchip DSA driver with an STMMAC
> > > > > Ethernet controller attached to the CPU port.
> > > > > 
> > > > > In the current selftest implementation, the TCP checksum validation fails,
> > > > > while the UDP test passes. The existing code prepares the skb for hardware
> > > > > checksum offload by setting skb->ip_summed = CHECKSUM_PARTIAL. For TCP, it sets
> > > > > the thdr->check field to the complement of the pseudo-header checksum, and for
> > > > > UDP, it uses udp4_hwcsum. If I understand it correct, this configuration tells
> > > > > the kernel that the hardware should perform the checksum calculation.
> > > > > 
> > > > > However, during testing, I noticed that "rx-checksumming" is enabled by default
> > > > > on the CPU port, and this leads to the TCP test failure.  Only after disabling
> > > > > "rx-checksumming" on the CPU port did the selftest pass. This suggests that the
> > > > > issue is specifically related to the hardware checksum offload mechanism in
> > > > > this particular setup. The behavior indicates that something on the path
> > > > > recalculated the checksum incorrectly.    
> > > > 
> > > > Interesting, that sounds like the smoking gun. When rx-checksumming 
> > > > is enabled the packet still reaches the stack right?    
> > > 
> > > No. It looks like this packets are just silently dropped, before they was
> > > seen by the stack. The only counter which confirms presence of this
> > > frames is HW specific mmc_rx_tcp_err. But it will be increasing even if
> > > rx-checksumming is disabled and packets are forwarded to the stack.  
> > 
> > If you happen to have the docs for the STMMAC instantiation in the SoC
> > it'd be good to check if discarding frames with bad csum can be
> > disabled. Various monitoring systems will expect the L4 checksum errors
> > to appear in nstat, not some obscure ethtool -S counter.  
> 
> Ack. I will it add to my todo.
> 
> For proper understanding of STMMAC and other drivers, here is how I currently
> understand the expected behavior on the receive path, with some open questions:
> 
> Receive Path Checksum Scenarios
> 
> * No Hardware Verification
>     * The hardware is not configured for RX checksum offload
>       or does not support the packet type, passing the packet to the driver
>       as-is.
>     * Expected driver behavior: The driver should set the packet's state to
>       `CHECKSUM_NONE`, signaling to the kernel that a software checksum
>       validation is required.
> 
> * Hardware Verifies and Reports All Frames (Ideal Linux Behavior)
>     * The hardware is configured not to drop packets with bad checksums.
>       It verifies the checksum of each packet and reports the result (good
>       or bad) in a status field on the DMA descriptor.
>     * Expected driver behavior: The driver must read the status for every
>       packet.
>         * If the hardware reports the checksum is good, the driver should set
>           the packet's state to `CHECKSUM_UNNECESSARY`.
>         * If the hardware reports the checksum is bad, the driver should set
>           the packet's state to `CHECKSUM_NONE` and still pass it to the
>           kernel.
>     * Open Questions:
>         * When the hardware reports a bad checksum in this mode, should the
>           driver increment `rx_crc_errors` immediately? Or should it only set
>           the packet's state to `CHECKSUM_NONE` and let the kernel stack find
>           the error and increment the counter, in order to avoid
>           double-counting the same error?

Driver can increment its local counter. It doesn't matter much.

But one important distinction, we're talking about layer 3 and up
checksums. IPv4 checksum, and TCP/UDP checksums. Those are not CRC.
The HW _should_ discard packets with bad CRC / Layer 2 checksum
unless the NETIF_F_RXALL feature is enabled.

> * Hardware Verifies and Drops on Error
>     * The hardware's RX checksum engine is active and configured to
>       automatically discard any packet with an incorrect checksum before it is
>       delivered to the driver.
>     * Open Questions:
> 
>         * When reporting these hardware-level drops, what is the most
>           appropriate existing standard `net_device_stats` counter to use
>           (e.g., `rx_crc_errors`, `rx_errors`)?

I'd say rx_errors, most likely to be noticed.

>         * If no existing standard counter is a good semantic fit, add new
>           standard counters?

Given this is behavior we don't want to encourage I think adding a
standard stat would send the wrong signal.

>         * If the "drop on error" feature cannot be disabled independently,
>           and reporting the error via a standard counter is not feasible,
>           does this imply that the entire RX checksum offload feature must be
>           disabled to ensure error visibility?

Probably not, users should also monitor rx_errors.

> * Hardware Provides Full Packet Checksum (`CHECKSUM_COMPLETE`)
>     * The hardware calculates a single checksum over the entire packet and
>       provides this value to the driver, without needing to parse the
>       L3/L4 headers.

Not entire, it skips the base Ethernet header (first 14 bytes)

>     * Expected driver behavior: The driver should place the checksum provided
>       by the hardware into the `skb->csum` field and set the packet's state
>       to `CHECKSUM_COMPLETE`.

Correct.

> > > > If so does the frame enter the stack with CHECKSUM_COMPLETE or
> > > > UNNECESSARY?    
> > > 
> > > If rx-checksumming is enabled and packet has supported ethertype,
> > > then CHECKSUM_UNNECESSARY will be set. Otherwise CHECKSUM_NONE.
> > >   
> > > > > When examining the loopbacked frames, I observed that the TCP checksum was
> > > > > incorrect. Upon further investigation, the xmit helper in net/dsa/tag_ksz.c
> > > > > includes the following:
> > > > > 
> > > > > if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))
> > > > >     return NULL;
> > > > > 
> > > > > I assume skb_checksum_help() is intended to calculate the proper checksum when
> > > > > CHECKSUM_PARTIAL is set, indicating that the software should complete the
> > > > > checksum before handing it to the hardware. My understanding is that the STMMAC
> > > > > hardware then calculates the checksum for egress frames if CHECKSUM_PARTIAL is
> > > > > used.    
> > > > 
> > > > stmmac shouldn't touch the frame, note that skb_checksum_help() sets
> > > > skb->ip_summed = CHECKSUM_NONE; so the skb should no longer be considered
> > > > for csum offload.    
> > > 
> > > It looks like skb_checksum_help(), which is used in tag_ksz.c, generates
> > > a TCP checksum without accounting for the IP pseudo-header. The
> > > resulting checksum is then incorrect and is filtered out by the STMMAC
> > > HW on ingress  
> > 
> > The pseudo-header csum is filled in net_test_get_skb(), where it calls
> > tcp_v4_check(). But I think you're right, it's incorrect. Could you try:
> > 
> > diff --git a/net/core/selftests.c b/net/core/selftests.c
> > index 35f807ea9952..1166dd1ddb07 100644
> > --- a/net/core/selftests.c
> > +++ b/net/core/selftests.c
> > @@ -160,8 +160,10 @@ static struct sk_buff *net_test_get_skb(struct net_device *ndev,
> >         skb->csum = 0;
> >         skb->ip_summed = CHECKSUM_PARTIAL;
> >         if (attr->tcp) {
> > -               thdr->check = ~tcp_v4_check(skb->len, ihdr->saddr,
> > -                                           ihdr->daddr, 0);
> > +               int l4len;
> > +
> > +               l4len = skb->tail - skb_transport_header(skb);
> > +               thdr->check = ~tcp_v4_check(l4len, ihdr->saddr, ihdr->daddr, 0);
> >                 skb->csum_start = skb_transport_header(skb) - skb->head;
> >                 skb->csum_offset = offsetof(struct tcphdr, check);
> >         } else {
> > 
> > Or some such?  
> 
> Ah, it works now!
> 
> So, for my understanding:
> - does skb_checksum_help() rely on a precalculated and integrated
>   pseudo-header csum?
> - And is this how typical HW-accelerated checksumming works?
> - Is this why it is called CHECKSUM_PARTIAL, because only one part of the
>   checksum is pre-calculated?

IDK why it's called PARTIAL, your guess seems reasonable :)
And yes on the other questions. PARTIAL means the HW is supposed to do
a csum over a linear buffer and write it where pointed without header
parsing. Because of how UDP and TCP define the csum for them that means
the csum field has to be set to the pseudo header csum.

Let me send the fix officially.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ