netdev - Re: [PATCH net-next v4 11/11] net: dsa: realtek: rtl8365mb: multiple cpu ports, non cpu extint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e5f1f484-5682-6338-f974-23e77f415018@arinc9.com>
Date:   Tue, 25 Jan 2022 12:44:03 +0300
From:   Arınç ÜNAL <arinc.unal@...nc9.com>
To:     Vladimir Oltean <olteanv@...il.com>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Florian Fainelli <f.fainelli@...il.com>,
        Luiz Angelo Daros de Luca <luizluca@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        Frank Wunderlich <frank-w@...lic-files.de>,
        Alvin Šipraga <ALSI@...g-olufsen.dk>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linus.walleij@...aro.org" <linus.walleij@...aro.org>,
        "vivien.didelot@...il.com" <vivien.didelot@...il.com>
Subject: Re: [PATCH net-next v4 11/11] net: dsa: realtek: rtl8365mb: multiple
 cpu ports, non cpu extint

On 25/01/2022 01:30, Vladimir Oltean wrote:
> On Mon, Jan 24, 2022 at 01:42:42PM -0800, Jakub Kicinski wrote:
>> On Mon, 24 Jan 2022 22:56:07 +0200 Vladimir Oltean wrote:
>>> On Mon, Jan 24, 2022 at 11:38:12AM -0800, Jakub Kicinski wrote:
>>>> On Mon, 24 Jan 2022 21:08:45 +0200 Vladimir Oltean wrote:
>>>>> So before we declare that any given Ethernet driver is buggy for declaring
>>>>> NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM and not checking that skb->csum_start
>>>>> points where it expects it to (taking into consideration potential VLAN
>>>>> headers, IPv6 extension headers),
>>>>
>>>> Extension headers are explicitly not supported by NETIF_F_IPV6_CSUM.
>>>>
>>>> IIRC Tom's hope was to delete NETIF_F_IP*_CSUM completely once all
>>>> drivers are converted to parsing and therefore can use NETIF_F_HW_CSUM.
>>>
>>> IIUC, NETIF_F_IP*_CSUM vs NETIF_F_HW_CSUM doesn't make that big of a
>>> difference in terms of what the driver should check for, if the hardware
>>> checksum offload engine can't directly be given the csum_start and
>>> csum_offset, wherever they may be.
>>>
>>>>> is there any driver that _does_ perform these checks correctly, that
>>>>> could be used as an example?
>>>>
>>>> I don't think so. Let me put it this way - my understanding is that up
>>>> until now we had been using the vlan_features, mpls_features etc to
>>>> perform L2/L2.5/below-IP feature stripping. This scales poorly to DSA
>>>> tags, as discussed in this thread.
>>>>
>>>> I'm suggesting we extend the kind of checking we already do to work
>>>> around inevitable deficiencies of device parsers for tunnels to DSA
>>>> tags.
>>>
>>> Sorry, I'm very tired and I probably don't understand what you're
>>> saying, so excuse the extra clarification questions.
>>>
>>> The typical protocol checking that drivers with NETIF_F_HW_CSUM do seems
>>> to be based on vlan_get_protocol()/skb->protocol/skb_network_header()/
>>> skb_transport_header() values, all of which make DSA invisible. So they
>>> don't work if the underlying hardware really doesn't like seeing an
>>> unexpected DSA header.
>>>
>>> When you say "I'm suggesting we extend the kind of checking we already do",
>>> do you mean we should modify the likes of e1000e and igb such that, if
>>> they're ever used as DSA masters, they do a full header parse of the
>>> packet (struct ethhdr :: h_proto, check if VLAN, struct iphdr/ipv6hdr,
>>> etc.) instead of the current logic?
>>
>> That was my thinking, yes. The exact amount of work depends on the
>> driver, I believe that more recent Intel parts (igb, ixgbe and newer)
>> pass a L3 offset to the HW. They treat L2 as opaque, ergo no patches
>> needed. At a glance e1000e passes the full skb_checksum_start_offset()
>> to HW, so likely even better.
> 
> Ah, right, I missed that. I agree that a driver that uses
> skb->csum_start is very likely to work unmodified with DSA headers
> (not trailers). I didn't notice this in e1000 because I was just
> searching for csum_start.
> 
>> It's only drivers for devices which actually want to parse the Ethertype
>> that would need extra checks. (Coincidentally such devices can't support
>> MPLS given the lack of L3 indication in the frame.)
>>
>>> It will be pretty convoluted unless
>>> we have some helper. Because if I follow through, for a DSA-tagged IP
>>> packet on xmit, skb->protocol is certainly htons(ETH_P_IP):
>>>
>>> ntohs(skb->protocol) = 0x800, csum_offset = 16, csum_start = 280, skb_checksum_start_offset = 54, skb->network_header = 260, skb_network_header_len = 20
>>>
>>> skb_dump output:
>>> skb len=94 headroom=226 headlen=94 tailroom=384
>>> mac=(226,34) net=(260,20) trans=280
>>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=1 segs=1))
>>> csum(0x100118 ip_summed=3 complete_sw=0 valid=0 level=0)
>>> hash(0x7710ee84 sw=0 l4=1) proto=0x0800 pkttype=0 iif=0
>>> dev name=eno2 feat=0x00020100001149a9
>>> sk family=2 type=1 proto=6
>>> skb headroom: 00000000: 6c 00 03 02 64 65 76 00 fe ed ca fe 28 00 00 00
>>> ...(junk)...
>>> skb headroom: 000000e0: 5f 43
>>>                          20 byte DSA tag
>>>                          |
>>>                          v
>>> skb linear:   00000000: 88 80 00 0a 80 00 00 00 00 00 00 00 08 00 30 00
>>>                                      skb_mac_header()
>>>                                      |
>>>                                      v
>>> skb linear:   00000010: 00 00 00 00 68 05 ca 92 af 20 00 04 9f 05 f6 28
>>>                                skb_network_header()
>>>                                |
>>>                                v
>>> skb linear:   00000020: 08 00 45 00 00 3c 26 47 40 00 40 06 00 49 0a 00
>>>                                            skb_checksum_start_offset
>>>                                            |
>>>                                            |                       csum_offset
>>>                                            v                       v
>>> skb linear:   00000030: 00 2c 0a 00 00 01 b6 08 14 51 11 1f 91 4f 00 00
>>> skb linear:   00000040: 00 00 a0 02 fa f0 14 5b 00 00 02 04 05 b4 04 02
>>> skb linear:   00000050: 08 0a 2e 00 e5 b8 00 00 00 00 01 03 03 07
>>
>> Oof, so in this case the DSA tag is _before_ the skb_mac_header()?
>> Or the prepend is supposed to be parsable as a Ethernet header?
>> Seems like any device that can do csum over this packet must already
>> use L3/L4 offsets or have explicit knowledge of DSA, right?
> 
> I'm sorry, there's a mistake, skb_mac_header() points to the DSA tag
> here (and skb_mac_header_len is 34), I wanted to say "real MAC header"
> but failed to do so. This case shouldn't pose any special problems.
> 
>>> I don't know, I just don't expect that non-DSA users of those drivers
>>> will be very happy about such changes. Do these existing protocol
>>> checking schemes qualify as buggy?
>>
>> Unfortunate reality of the checksum offloads is that most drivers for
>> devices which parse on Tx are buggy, it's more of a question of whether
>> anyone tried to use an unsupported protocol stack :( Recent example
>> that comes to mind is 1698d600b361 ("bnxt_en: Implement
>> .ndo_features_check().").
> 
> Nice hook this ndo_features_check! In my reading of validate_xmit_skb()
> I went right past it.
> 
>>> If this is the convention that we want to enforce, then I can't really
>>> help Luiz with fixing the OpenWRT mtk_eth_soc.c - he'll have to figure
>>> out a way to parse the packets for which his hardware will accept the
>>> checksumming offload, and call skb_checksum_help() otherwise.
>>>
>>>> We can come up with various schemes of expressing capabilities
>>>> between underlying driver and tag driver. I'm not aware of similar
>>>> out-of-band schemes existing today so it'd be "DSA doing it's own
>>>> thing", which does not seem great.
>>>
>>> It at least seems less complex to me, and less checking in the fast path
>>> if I understand everything that's been said correctly.
>>
>> I understand, I'm primarily trying to share some context and prior work.
>> I don't mean to nack all other approaches.
>>
>> I believe writing a parser matching the device behavior would be easier
>> for a driver author than interpreting the runes of our csum offload API
>> and getting thru the thicket of all the bits. If that's not the case my
>> argument is likely defeated.
> 
> Ok, writing a parser might be needed if the DSA master is going, for
> some reason, to support TX checksum offloading with some DSA headers but
> not with others.
> 
> If that is not the case, and that Ethernet controller simply doesn't
> support TX checksumming unless it's the plain old Ethernet {+ VLAN} +
> IP/IPv6 + TCP/UDP, then this blanket patch below should fix the problem
> almost elegantly, and parsing is a useless complication (warning, not
> even compile-tested!):

I tried it on the upstream mtk_eth_soc on an mt7621a board with an 
rtl8367s switch connected to the 2nd GMAC of the SoC running 5.17-rc1.

Although "ethtool --show-offload eth1" does not show anything different, 
I can now ssh to the device (only way that came into my mind to quickly 
check TCP traffic).

--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1986,6 +1986,16 @@ static int mtk_hwlro_get_fdir_all(struct
  	return 0;
  }

+static netdev_features_t fe_features_check(struct sk_buff *skb,
+					   struct net_device *dev,
+					   netdev_features_t features)
+{
+	if (netdev_uses_dsa(dev))
+		features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
+
+	return features;
+}
+
  static netdev_features_t mtk_fix_features(struct net_device *dev,
  					  netdev_features_t features)
  {
@@ -2906,6 +2916,7 @@ static const struct net_device_ops mtk_n
  #ifdef CONFIG_NET_POLL_CONTROLLER
  	.ndo_poll_controller	= mtk_poll_controller,
  #endif
+	.ndo_features_check	= fe_features_check,
  	.ndo_setup_tc		= mtk_eth_setup_tc,
  };


> 
> -----------------------------[ cut here ]-----------------------------
>  From 5ef3d3cd8441d756933558212f518f48754c64d9 Mon Sep 17 00:00:00 2001
> From: Vladimir Oltean <vladimir.oltean@....com>
> Date: Tue, 25 Jan 2022 00:16:57 +0200
> Subject: [PATCH] ramips: ethernet: ralink: disable TX checksumming on DSA
>   masters
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
>   .../files/drivers/net/ethernet/ralink/mtk_eth_soc.c  | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c b/target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c
> index e07e5ed5a8f8..6ed9bc5942fd 100644
> --- a/target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c
> +++ b/target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c
> @@ -31,6 +31,7 @@
>   #include <linux/io.h>
>   #include <linux/bug.h>
>   #include <linux/netfilter.h>
> +#include <net/dsa.h>
>   #include <net/netfilter/nf_flow_table.h>
>   #include <linux/of_gpio.h>
>   #include <linux/gpio.h>
> @@ -1497,6 +1498,16 @@ static int fe_change_mtu(struct net_device *dev, int new_mtu)
>   	return fe_open(dev);
>   }
>   
> +static netdev_features_t fe_features_check(struct sk_buff *skb,
> +					   struct net_device *dev,
> +					   netdev_features_t features)
> +{
> +	if (netdev_uses_dsa(dev))
> +		features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
> +
> +	return features;
> +}
> +
>   static const struct net_device_ops fe_netdev_ops = {
>   	.ndo_init		= fe_init,
>   	.ndo_uninit		= fe_uninit,
> @@ -1514,6 +1525,7 @@ static const struct net_device_ops fe_netdev_ops = {
>   #ifdef CONFIG_NET_POLL_CONTROLLER
>   	.ndo_poll_controller	= fe_poll_controller,
>   #endif
> +	.ndo_features_check	= fe_features_check,
>   };
>   
>   static void fe_reset_pending(struct fe_priv *priv)
> -----------------------------[ cut here ]-----------------------------
> 
> This is essentially what Florian said ages ago, it just took me a very
> long time to process. I guess what hadn't fully clicked in my head is
> that the TX checksumming offload being functional is more a matter of
> telling the hardware what are the L3 and L4 offsets, and the csum_offset,
> rather than it requiring any particular understanding of the DSA header.
> In turn, it means that for "nice" Ethernet controller implementations
> where that is the case, it would be actively detrimential to add a new
> .ndo_get_dsa_features() or something like that - because such a driver
> would report that it supports all DSA header-type formats (trailers are
> still broken as long as there isn't a csum_end). And keeping that kind
> of driver in sync with all DSA protocols that appear will become a
> repetitive task.
> 
> So crisis averted, I guess?
> Thanks a lot to both of you for the patient explanations!
> I retract my proposal for a new ndo and also suggest that the DSA master
> driver takes care to not leave as zero a TX checksum it can't offload.