lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240104005327.a6747bpbqt24xlbo@skbuf>
Date: Thu, 4 Jan 2024 02:53:27 +0200
From: Vladimir Oltean <olteanv@...il.com>
To: Linus Walleij <linus.walleij@...aro.org>
Cc: Hans Ulli Kroll <ulli.kroll@...glemail.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Household Cang <canghousehold@....com>,
	Romain Gantois <romain.gantois@...tlin.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net v5 2/2] net: ethernet: cortina: Bypass checksumming
 engine of alien ethertypes

On Tue, Jan 02, 2024 at 09:34:26PM +0100, Linus Walleij wrote:
> We had workarounds were the ethernet checksumming engine would be bypassed

s/were/where/

> for larger frames, this fixed devices using DSA, but regressed devices
> where the ethernet was connected directly to a PHY.
> 
> The devices with a PHY connected directly can't handle large frames
> either way, with or without bypass. Looking at the size of the frame
> is probably just wrong.

"Looking at the size of the frame is probably just wrong." yet you keep it.

Not only is this confusing for you to say this, but I believe that the
skb->len check is the _only_ thing that is needed. Explanation below.

> Rework the workaround such that we don't activate the checksumming engine if
> the ethertype inside the actual frame is something else than 0x0800
> (IPv4) or 0x86dd (IPv6). These are the only frames the checksumming engine
> can actually handle. VLAN framing (0x8100) also works fine.

Premise:

This driver does not set NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM (or anything)
in dev->vlan_features. Upper interface drivers which look at dev->vlan_features
in order to determine their own features are 8021q and DSA.

Packets transmitted through stacked interfaces have 3 checksumming points.
Two in software, during validate_xmit_skb() on the respective netdev,
depending on its features and skb->ip_summed, and one in the xmit
procedure of the hardware driver - gmac_start_xmit().

In short, I believe that the code which you have added to inspect the
ethertype - and based on that to avoid the "if (skb->ip_summed == CHECKSUM_PARTIAL)"
test - is bogus (a cost you are paying for nothing).

I'm saying this because I think that those
"(ethertype != ETH_P_IP && ethertype != ETH_P_IPV6)" frames wouldn't
have entered the "skb->ip_summed == CHECKSUM_PARTIAL" test anyway.

DSA-tagged frames should come with CHECKSUM_NONE, having been checksummed
in software already, by the first validate_xmit_skb() - DSA not having
inherited the checksum offload feature, because it's not in dev->vlan_features.

Coincidentally, this is also the reason why in your tests, DSA-tagged
TCP/UDP traffic still has a proper checksum, despite you bypassing the
hardware offload, and no longer calling skb_checksum_help() from the
driver. It was never needed, because the checksum was always already
calculated.

And VLAN traffic should also come with CHECKSUM_NONE, for the same reason.

The one difference between DSA and VLAN is that for DSA, you sometimes
set TSS_BYPASS_BIT (for large frames) and for VLAN you never do.

> 
> We can't inspect skb->protocol because DSA frames will sometimes have a
> custom ethertype despite skb->protocol is e.g. 0x0800.
> 
> If the frame is ALSO over the size of an ordinary ethernet frame,
> we will actively bypass the checksumming engine. (Always doing this
> makes the hardware unstable.)
> 
> After this both devices with direct ethernet attached such as D-Link
> DNS-313 and devices with a DSA switch with a custom ethertype such as
> D-Link DIR-685 work fine.
> 
> Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
> Signed-off-by: Linus Walleij <linus.walleij@...aro.org>
> ---
>  drivers/net/ethernet/cortina/gemini.c | 33 +++++++++++++++++++++++++--------
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
> index 5e399c6e095b..68da4ae26248 100644
> --- a/drivers/net/ethernet/cortina/gemini.c
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -1142,22 +1143,38 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  	struct gmac_txdesc *txd;
>  	skb_frag_t *skb_frag;
>  	dma_addr_t mapping;
> +	u16 ethertype;
>  	void *buffer;
>  
>  	/* TODO: implement proper TSO using MTU in word3 */
>  	word1 = skb->len;
>  	word3 = SOF_BIT | skb->len;
>  
> -	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +	/* Dig out the the ethertype actually in the buffer and not what the
> +	 * protocol claims to be. This is the raw data that the checksumming
> +	 * offload engine will have to deal with.
> +	 */
> +	ethertype = ntohs(eth_header_parse_protocol(skb));
> +	/* This is the only VLAN type supported by this hardware so check for
> +	 * that: the checksumming engine can handle IP and IPv6 inside 802.1Q.
> +	 */
> +	if (ethertype == ETH_P_8021Q)
> +		ethertype = ntohs(__vlan_get_protocol(skb, htons(ethertype), NULL));

Random fact: if you store "ethertype" as __be16 and perform htons() on the
constant value instead, the htons() operation will be performed at compile
time and should result in fewer instructions per packet in the fast path.

> +
> +	if (ethertype != ETH_P_IP && ethertype != ETH_P_IPV6) {
> +		/* Hardware offloaded checksumming isn't working on non-IP frames.
> +		 * This happens for example on some DSA switches using a custom
> +		 * ethertype. When a frame gets bigger than a standard ethernet
> +		 * frame, it also needs to actively bypass the checksumming engine.
> +		 * There is no clear explanation to why it is like this, the
> +		 * reference manual has left the TSS completely undocumented.
> +		 */
> +		if (skb->len > ETH_FRAME_LEN)
> +			word1 |= TSS_BYPASS_BIT;

Do you know what "TSS_BYPASS_BIT" does, exactly?

> +	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
>  		int tcp = 0;
>  
> -		/* We do not switch off the checksumming on non TCP/UDP
> -		 * frames: as is shown from tests, the checksumming engine
> -		 * is smart enough to see that a frame is not actually TCP
> -		 * or UDP and then just pass it through without any changes
> -		 * to the frame.
> -		 */
> -		if (skb->protocol == htons(ETH_P_IP)) {
> +		if (ethertype == ETH_P_IP) {
>  			word1 |= TSS_IP_CHKSUM_BIT;
>  			tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
>  		} else { /* IPv6 */
> 
> -- 
> 2.34.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ