[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJq09z48A7Y6p=yNocUv17Ji1AfSuP4e6MdT1tNDY0Pfz_Om=A@mail.gmail.com>
Date: Mon, 31 Jan 2022 14:26:30 -0300
From: Luiz Angelo Daros de Luca <luizluca@...il.com>
To: Florian Fainelli <f.fainelli@...il.com>
Cc: Frank Wunderlich <frank-w@...lic-files.de>,
Alvin Šipraga <ALSI@...g-olufsen.dk>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linus.walleij@...aro.org" <linus.walleij@...aro.org>,
"andrew@...n.ch" <andrew@...n.ch>,
"vivien.didelot@...il.com" <vivien.didelot@...il.com>,
"olteanv@...il.com" <olteanv@...il.com>,
"arinc.unal@...nc9.com" <arinc.unal@...nc9.com>
Subject: Re: [PATCH net-next v4 11/11] net: dsa: realtek: rtl8365mb: multiple
cpu ports, non cpu extint
> On 1/29/2022 8:42 PM, Luiz Angelo Daros de Luca wrote:
> >>> I suggested it might be checksum problem because I'm also affected. In
> >>> my case, I have an mt7620a SoC connected to the rtl8367s switch. The
> >>> OS offloads checksum to HW but the mt7620a cannot calculate the
> >>> checksum with the (EtherType) Realtek CPU Tag in place. I'll try to
> >>> move the CPU tag to test if the mt7620a will then digest the frame
> >>> correctly.
> >>
> >> I implemented a new DSA tag (rtl8_4t, with "t" as in trailing) that
> >> puts the DSA tag before the Ethernet CRC (the switch supports both).
> >> With no tag in the mac layer, mediatek correctly calculated the ip
> >> checksum. However, mediatek SoC included the extra bytes from the DSA
> >> tag in the TCP checksum, even if they are after the ip length.
> >>
> >> This is the packet leaving the OS:
> >>
> >> 0000 04 0e 3c fc 4f aa 50 d4 f7 33 15 8a 08 00 45 10
> >> 0010 00 3c 00 00 40 00 40 06 b7 58 c0 a8 01 01 c0 a8
> >> 0020 01 02 00 16 a1 50 80 da 39 e9 b2 2a 23 cf a0 12
> >> 0030 fe 88 83 82 00 00 02 04 05 b4 04 02 08 0a 01 64
> >> 0040 fb 28 66 42 e0 79 01 03 03 03 88 99 04 00 00 20
> >> 0050 00 08
> >>
> >> TCP checksum is at 0x0032 with 0x8382 is the tcp checksum
> >> DSA Tag is at 0x4a with 8899040000200008
> >>
> >> This is what arrived at the other end:
> >>
> >> 0000 04 0e 3c fc 4f aa 50 d4 f7 33 15 8a 08 00 45 10
> >> 0010 00 3c 00 00 40 00 40 06 b7 58 c0 a8 01 01 c0 a8
> >> 0020 01 02 00 16 a1 50 80 da 39 e9 b2 2a 23 cf a0 12
> >> 0030 fe 88 c3 e8 00 00 02 04 05 b4 04 02 08 0a 01 64
> >> 0040 fb 28 66 42 e0 79 01 03 03 03
> >>
> >> TCP checksum is 0xc3e8, but the correct one should be 0x50aa
> >> If you calculate tcp checksum including 8899040000200008, you'll get exactly
> >> 0xc3e8 (I did the math).
> >>
> >> So, If we use a trailing DSA tag, we can leave the IP checksum offloading on
> >> and just turn off the TCP checksum offload. Is it worth it?
> >
> > No, IP checksum is always done in SW.
> >
> >> Is it still interesting to have the rtl8_4t merged?
> >
> > Maybe it is. It has uncovered a problem. The case of trailing tags
> > seems to be unsolvable even with csum_start. AFAIK, the driver must
> > cksum from "skb->csum_start up to the end". When the switch is using
> > an incompatible tag, we have:
> >
> > slave(): my features copied from master tells me I can offload
> > checksum. Do nothing
> > tagger(): add tag to the end of skb
> > master(): Offloading HW, chksum from csum_start until the end,
> > including the added tag
> > switch(): remove the tag, forward to the network
> > remove_client(): I got a packet with a broken checksum.
>
> This is unfortunately expected here, because you program the hardware
> with the full Ethernet frame length which does include the trailer tag,
> and it then uses that length to calculate the transport header checksum
> over the enter payload, thinking the trailer tag is the UDP/TCP payload.
>
> The checksum is calculated "on the fly" as part of the DMA operation to
> send the packet on the wire, so you cannot decouple the checksum
> calculation from the DMA operation, other than by not asking the HW *not
> to* checksum the packet, and instead having software provide that.
>
> Now looking at the datasheet you quoted, there is this:
>
> 241. FE_GLO_CFG: Frame Engine Global Configuration (offset: 0x0000)
>
> 7:4 RW L2_SPACE L2 Space
> (unit: 8 bytes)
> 0xB
>
> Can you play with this and see if you can account for the extra 4 bytes
> added by the Realtek tag?
>
I played with it, both with the L2_SPACE and RATE_MINUS:
FE_GLO_CFG_REG=0x10100000 FE_GLO_CFG_SIZE=32
FE_GLO_CFG=$(($(devmem $FE_GLO_CFG_REG $FE_GLO_CFG_SIZE)));
for l2space_sig in b0 b1 c0 c1 d0 d1 e0 e1 a0 a1 90 91 80 81 70 71 60
61 50 51 40 41 30 31 20 21 10 11 01 00 e0 e1; do
FE_GLO_CFG=$(($(devmem $FE_GLO_CFG_REG $FE_GLO_CFG_SIZE)));
printf 'Before FE_GLO_CFG = 0x%X\n' $FE_GLO_CFG;
devmem $FE_GLO_CFG_REG $FE_GLO_CFG_SIZE $((FE_GLO_CFG & ~0x00000ff
| (0x$l2space_sig)));
FE_GLO_CFG=$(($(devmem $FE_GLO_CFG_REG $FE_GLO_CFG_SIZE)));
printf 'After FE_GLO_CFG = 0x%X\n' $FE_GLO_CFG;
echo "Please test L2_SPACE_sig==$l2space_sig"; read;
done; devmem $FE_GLO_CFG_REG $FE_GLO_CFG_SIZE $FE_GLO_CFG_ORIG
It only made a difference for values 0x0 and 0xf but it looks more
like an overflow. And only on the traffic I receive, not send. The
remote endpoint
always receive 0x8382 as the tcp checksum, which is the "fake ip header" sum.
The default value is 0xb (11) and docs says it is a 8-byte unit. What
is 11 * 8 bytes? 88 bytes? Maybe it is wrong in docs.
That same register also has EXT_VLAN, which points to 0x8100 (802.1Q ethertype).
In the same doc, there is also a mention about the L2 space usage,
only related to received traffic:
"1. RX_CTRL pass through VLAN tags on L2 space (at most 2 tags)" (page 245-247)
Anyway, even if the Mediatek switch could remove the Realtek tag, it
should not do that. The Realtek switch still needs it.
> > ndo_features_check() will not help because, either in HW or SW, it is
> > expected to calculate the checksum up to the end. However, there is no
> > csum_end or csum_len. I don't know if HW offloading will support some
> > kind of csum_end but it would not be a problem in SW (considering
> > skb_checksum_help() is adapted to something like skb_checksum_trimmed
> > without the clone).
> >
> > That amount of bytes to ignore at the end is a complex question: the
> > driver either needs some hint (like it happens with skb->csum_offset)
> > to know where transport payload ends or the taggers (or the dsa) must
> > save the amount of extra bytes (or which tags were added) in the
> > sbk_buff. With that info, the driver can check if HW will work with a
> > different csum_start / csum_end or if only a supported tag is in use.
I must be missing something. Is SW TCP checksum really broken when a
tailing tag is in use? If so, it will only work if TCP checksum
offload is enabled in a compatible HW. Anything else like different
vendors, software checksum or stacked tags will be broken.
> > In my case, using an incompatible tailing tag, I just made it work
> > hacking dsa and forcing slave interfaces to disable offloading. This
> > way, checksum is calculated before any tag is added and offloading is
> > skipped. But it is not a real solution.
>
> Not sure which one is not a "real solution", but for this specific
> combination of DSA conduit driver and switch tag, you have to disable
> checksum offload in the conduit driver and provide it in software. The
> other way would be to configure the realtek switch to work with
> DSA_TAG_8021Q and see if you can continue to offload the data path since
> tagging would use regular 802.1Q vlans, but that means you are going to
> lose a whole lot of management functionality offered by the native
> Realtek tag.
Definitely not a real solution. It was just a hack to check if
checksumming at slave device will overcome the issue. As I said,
simply disabling checksum and doing it in SW "as usual" is not enough
because SW checksum also sums to the end. We need to parse each
possible transport layer to find its end or taggers must hint how many
bytes to ignore, something like a new skb->cksum_stop_before_end.
Another solution would be to hint the slave interface if it needs to
checksum right there (modifying slave->vlan_features). None of that
exists today. Is it the right way?
--
Luiz
Powered by blists - more mailing lists