[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af73e5bb-c73d-4f7a-a118-97345e9512fe@kontron.de>
Date: Thu, 30 Jan 2025 11:36:52 +0100
From: Frieder Schrempf <frieder.schrempf@...tron.de>
To: Lukasz Majewski <lukma@...x.de>
Cc: Andrew Lunn <andrew@...n.ch>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: KSZ9477 HSR Offloading
On 30.01.25 11:19 AM, Frieder Schrempf wrote:
> Hi Lukasz,
>
> On 29.01.25 11:52 PM, Lukasz Majewski wrote:
>> Hi Frieder,
>>
>>> On 29.01.25 2:58 PM, Lukasz Majewski wrote:
>>>> Hi Frieder,
>>>>
>>>>> Hi Lukasz,
>>>>>
>>>>> On 29.01.25 12:17 PM, Lukasz Majewski wrote:
>>>>>> Hi Frieder,
>>>>>>
>>>>>>> On 29.01.25 8:24 AM, Frieder Schrempf wrote:
>>>>>>>> Hi Andrew,
>>>>>>>>
>>>>>>>> On 28.01.25 6:51 PM, Andrew Lunn wrote:
>>>>>>>>> On Tue, Jan 28, 2025 at 05:14:46PM +0100, Frieder Schrempf
>>>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm trying out HSR support on KSZ9477 with v6.12. My setup
>>>>>>>>>> looks like this:
>>>>>>>>>>
>>>>>>>>>> +-------------+ +-------------+
>>>>>>>>>> | | | |
>>>>>>>>>> | Node A | | Node D |
>>>>>>>>>> | | | |
>>>>>>>>>> | | | |
>>>>>>>>>> | LAN1 LAN2 | | LAN1 LAN2 |
>>>>>>>>>> +--+-------+--+ +--+------+---+
>>>>>>>>>> | | | |
>>>>>>>>>> | +---------------+ |
>>>>>>>>>> | |
>>>>>>>>>> | +---------------+ |
>>>>>>>>>> | | | |
>>>>>>>>>> +--+-------+--+ +--+------+---+
>>>>>>>>>> | LAN1 LAN2 | | LAN1 LAN2 |
>>>>>>>>>> | | | |
>>>>>>>>>> | | | |
>>>>>>>>>> | Node B | | Node C |
>>>>>>>>>> | | | |
>>>>>>>>>> +-------------+ +-------------+
>>>>>>>>>>
>>>>>>>>>> On each device the LAN1 and LAN2 are added as HSR slaves.
>>>>>>>>>> Then I try to do ping tests between each of the HSR
>>>>>>>>>> interfaces.
>>>>>>>>>>
>>>>>>>>>> The result is that I can reach the neighboring nodes just
>>>>>>>>>> fine, but I can't reach the remote node that needs packages
>>>>>>>>>> to be forwarded through the other nodes. For example I can't
>>>>>>>>>> ping from node A to C.
>>>>>>>>>>
>>>>>>>>>> I've tried to disable HW offloading in the driver and then
>>>>>>>>>> everything starts working.
>>>>>>>>>>
>>>>>>>>>> Is this a problem with HW offloading in the KSZ driver, or am
>>>>>>>>>> I missing something essential?
>>>>>>
>>>>>> Thanks for looking and testing such large scale setup.
>>>>>>
>>>>>>>>>
>>>>>>>>> How are IP addresses configured? I assume you have a bridge,
>>>>>>>>> LAN1 and LAN2 are members of the bridge, and the IP address is
>>>>>>>>> on the bridge interface?
>>>>>>>>
>>>>>>>> I have a HSR interface on each node that covers LAN1 and LAN2 as
>>>>>>>> slaves and the IP addresses are on those HSR interfaces. For
>>>>>>>> node A:
>>>>>>>>
>>>>>>>> ip link add name hsr type hsr slave1 lan1 slave2 lan2
>>>>>>>> supervision 45 version 1
>>>>>>>> ip addr add 172.20.1.1/24 dev hsr
>>>>>>>>
>>>>>>>> The other nodes have the addresses 172.20.1.2/24, 172.20.1.3/24
>>>>>>>> and 172.20.1.4/24 respectively.
>>>>>>>>
>>>>>>>> Then on node A, I'm doing:
>>>>>>>>
>>>>>>>> ping 172.20.1.2 # neighboring node B works
>>>>>>>> ping 172.20.1.4 # neighboring node D works
>>>>>>>> ping 172.20.1.3 # remote node C works only if I disable
>>>>>>>> offloading
>>>>>>>
>>>>>>> BTW, it's enough to disable the offloading of the forwarding for
>>>>>>> HSR frames to make it work.
>>>>>>>
>>>>>>> --- a/drivers/net/dsa/microchip/ksz9477.c
>>>>>>> +++ b/drivers/net/dsa/microchip/ksz9477.c
>>>>>>> @@ -1267,7 +1267,7 @@ int ksz9477_tc_cbs_set_cinc(struct
>>>>>>> ksz_device *dev, int port, u32 val)
>>>>>>> * Moreover, the NETIF_F_HW_HSR_FWD feature is also enabled, as
>>>>>>> HSR frames
>>>>>>> * can be forwarded in the switch fabric between HSR ports.
>>>>>>> */
>>>>>>> -#define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_DUP |
>>>>>>> NETIF_F_HW_HSR_FWD)
>>>>>>> +#define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_DUP)
>>>>>>>
>>>>>>> void ksz9477_hsr_join(struct dsa_switch *ds, int port, struct
>>>>>>> net_device *hsr)
>>>>>>> {
>>>>>>> @@ -1279,16 +1279,6 @@ void ksz9477_hsr_join(struct dsa_switch
>>>>>>> *ds, int port, struct net_device *hsr)
>>>>>>> /* Program which port(s) shall support HSR */
>>>>>>> ksz_rmw32(dev, REG_HSR_PORT_MAP__4, BIT(port),
>>>>>>> BIT(port));
>>>>>>>
>>>>>>> - /* Forward frames between HSR ports (i.e. bridge together
>>>>>>> HSR ports) */
>>>>>>> - if (dev->hsr_ports) {
>>>>>>> - dsa_hsr_foreach_port(hsr_dp, ds, hsr)
>>>>>>> - hsr_ports |= BIT(hsr_dp->index);
>>>>>>> -
>>>>>>> - hsr_ports |= BIT(dsa_upstream_port(ds, port));
>>>>>>> - dsa_hsr_foreach_port(hsr_dp, ds, hsr)
>>>>>>> - ksz9477_cfg_port_member(dev,
>>>>>>> hsr_dp->index, hsr_ports);
>>>>>>> - }
>>>>>>> -
>>>>>>> if (!dev->hsr_ports) {
>>>>>>> /* Enable discarding of received HSR frames */
>>>>>>> ksz_read8(dev, REG_HSR_ALU_CTRL_0__1, &data);
>>>>>>
>>>>>> This means that KSZ9477 forwarding is dropping frames when HW
>>>>>> acceleration is used (for non "neighbour" nodes).
>>>>>>
>>>>>> On my setup I only had 2 KSZ9477 devel boards.
>>>>>>
>>>>>> And as you wrote - the SW based one works, so extending
>>>>>> https://elixir.bootlin.com/linux/v6.12-rc2/source/tools/testing/selftests/net/hsr
>>>>>>
>>>>>> would not help in this case.
>>>>>
>>>>> I see. With two boards you can't test the accelerated forwarding.
>>>>> So how did you test the forwarding at all? Or are you telling me,
>>>>> that this was added to the driver without prior testing (which
>>>>> seems a bit bold and unusual)?
>>>>
>>>> The packet forwarding is for generating two frames copies on two HSR
>>>> coupled ports on a single KSZ9477:
>>>
>>> Isn't that what duplication aka NETIF_F_HW_HSR_DUP is for?
>>
>> As I mentioned - the NETIF_F_HW_HSR_DUP is to remove duplicated frames.
>>
>> NETIF_F_HW_HSR_FWD is to in-hw generate frame copy for HSR port to be
>> sent:
>> https://elixir.bootlin.com/linux/v6.13/source/drivers/net/dsa/microchip/ksz9477.c#L1252
>
> Ok, so you are using a different definition for the "forwarding". What
> puzzles me is the explanation for the HSR feature flags here [1]. They
> seem to suggest the following, which differs from your explanation:
>
> Forwarding (aka NETIF_F_HW_HSR_FWD):
>
> "Forwarding involves automatically forwarding between redundant ports in
> an HSR."
>
> This sounds more like the "forwarding" of a HSR frame within the Ring,
> between two HSR ports, that I was thinking of.
>
> Duplication (aka NETIF_F_HW_HSR_DUP):
>
> "Duplication involves the switch automatically sending a single frame
> from the CPU port to both redundant ports."
>
> This is contradictory to what you wrote above and sounds more
> reasonable. This is what you instead call forwarding above.
>
> Are you mixing things up, here? Am I completely on the wrong track? I'm
> just trying to understand the basics here.
>
>>>
>>>>
>>>> https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ApplicationNotes/ApplicationNotes/AN3474-KSZ9477-High-Availability-Seamless-Redundancy-Application-Note-00003474A.pdf
>>>>
>>>> The KSZ9477 chip also supports RX packet duplication removal, but
>>>> cannot guarantee 100% success (so as a fallback it is done in SW).
>>>>
>>>> The infrastructure from:
>>>> https://elixir.bootlin.com/linux/v6.13/source/tools/testing/selftests/net/hsr/hsr_redbox.sh#L50
>>>>
>>>> is enough to test HW accelerated forwarding (of KSZ9477) from NS1
>>>> and NS2.
>>>
>>> I'm not really sure if I get it. In this setup NS1 and NS2 are
>>> connected via HSR link (two physical links). On one side packets are
>>> sent duplicated on both physical ports. On the receiving side the
>>> duplication is removed and one packet is forwarded to the CPU.
>>>
>>> Where is forwarding involved here?
>>
>> In-HW forwarding is when KSZ9477 duplicates frame to be send on second
>> HSR aware port.
>>
>> (only 2 of them can be coupled to have in-hw support for duplication
>> and forwarding. Creating more hsr "interfaces" would just use SW).
>>
>>> Isn't forwarding only for cases
>>> with one intermediate node between the sending and receiving node?
>>
>> This kind of "forwarding" is done in software in the hsr driver.
>
> But according to the official explanation of the flags [1] this kind of
> forwarding is exactly what NETIF_F_HW_HSR_FWD seems to be about.
>
>>
>>>
>>>>
>>>>>
>>>>> Anyway, do you have any suggestions for debugging this?
>>>>> Unfortunately I know almost nothing about this topic. But I can
>>>>> offer to test on my setup, at least for now. I don't know how long
>>>>> I will still have access to the hardware.
>>>>
>>>> For some reason only frames to neighbours are delivered.
>>>>
>>>> So those are removed at some point (either in KSZ9477 HW or in HSR
>>>> driver itself).
>>>>
>>>> Do you have some dumps from tshark/wireshark to share?
>>>>
>>>>>
>>>>> If we can't find a proper solution in the long run, I will probably
>>>>> send a patch to disable the accelerated forwarding to at least make
>>>>> HSR work by default.
>>>>
>>>> As I've noted above - the HW accelerated forwarding is in the
>>>> KSZ9477 chip.
>>>
>>> Yeah, but if the HW accelerated forwarding doesn't work
>>
>> The "forwarding" in KSZ9477 IC works OK, as frames are duplicated (i.e.
>> forwarded) to both HSR coupled ports.
>
> No, I don't think duplication and forwarding refer to the same thing
> here. At least it doesn't make sense for me.
>
>> The problem is with dropping frames travelling in connected KSZ9477
>> devices.
>
> I'm not really sure.
>
>>
>>> it would be
>>> better to use no acceleration and have it work in SW at least by
>>> default, right?
>>
>> IMHO, it would be best to fix the code.
>
> Agreed.
>
>>>
>>>>
>>>> The code which you uncomment, is following what I've understood from
>>>> the standard (and maybe the bug is somewhere there).
>>>
>>> Ok, thanks for explaining. I will see if I can find some time to
>>> gather some more information on the problem.
>>
>> That would be very helpful. Thanks in advance for it.
> One more information I just found out: I can leave ksz9477_hsr_join()
> untouched and only remove the feature flags from
> KSZ9477_SUPPORTED_HSR_FEATURES to make things work.
>
> Broken:
>
> #define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_DUP |
> NETIF_F_HW_HSR_FWD)
>
> Works:
>
> #define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_DUP)
>
> Works:
>
> #define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_FWD)
>
> Works:
>
> #define KSZ9477_SUPPORTED_HSR_FEATURES (0)
Please take this with a grain of salt. I've just seen, that the behavior
is not 100% consistent.
I'm pretty sure that things work reliably with the diff I posted before
as I have tested this case more thoroughly.
Powered by blists - more mailing lists