lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <af73e5bb-c73d-4f7a-a118-97345e9512fe@kontron.de>
Date: Thu, 30 Jan 2025 11:36:52 +0100
From: Frieder Schrempf <frieder.schrempf@...tron.de>
To: Lukasz Majewski <lukma@...x.de>
Cc: Andrew Lunn <andrew@...n.ch>,
 "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: KSZ9477 HSR Offloading

On 30.01.25 11:19 AM, Frieder Schrempf wrote:
> Hi Lukasz,
> 
> On 29.01.25 11:52 PM, Lukasz Majewski wrote:
>> Hi Frieder,
>>
>>> On 29.01.25 2:58 PM, Lukasz Majewski wrote:
>>>> Hi Frieder,
>>>>   
>>>>> Hi Lukasz,
>>>>>
>>>>> On 29.01.25 12:17 PM, Lukasz Majewski wrote:  
>>>>>> Hi Frieder,
>>>>>>     
>>>>>>> On 29.01.25 8:24 AM, Frieder Schrempf wrote:    
>>>>>>>> Hi Andrew,
>>>>>>>>
>>>>>>>> On 28.01.25 6:51 PM, Andrew Lunn wrote:      
>>>>>>>>> On Tue, Jan 28, 2025 at 05:14:46PM +0100, Frieder Schrempf
>>>>>>>>> wrote:      
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm trying out HSR support on KSZ9477 with v6.12. My setup
>>>>>>>>>> looks like this:
>>>>>>>>>>
>>>>>>>>>> +-------------+         +-------------+
>>>>>>>>>> |             |         |             |
>>>>>>>>>> |   Node A    |         |   Node D    |
>>>>>>>>>> |             |         |             |
>>>>>>>>>> |             |         |             |
>>>>>>>>>> | LAN1   LAN2 |         | LAN1   LAN2 |
>>>>>>>>>> +--+-------+--+         +--+------+---+
>>>>>>>>>>    |       |               |      |
>>>>>>>>>>    |       +---------------+      |
>>>>>>>>>>    |                              |
>>>>>>>>>>    |       +---------------+      |
>>>>>>>>>>    |       |               |      |
>>>>>>>>>> +--+-------+--+         +--+------+---+
>>>>>>>>>> | LAN1   LAN2 |         | LAN1   LAN2 |
>>>>>>>>>> |             |         |             |
>>>>>>>>>> |             |         |             |
>>>>>>>>>> |   Node B    |         |   Node C    |
>>>>>>>>>> |             |         |             |
>>>>>>>>>> +-------------+         +-------------+
>>>>>>>>>>
>>>>>>>>>> On each device the LAN1 and LAN2 are added as HSR slaves.
>>>>>>>>>> Then I try to do ping tests between each of the HSR
>>>>>>>>>> interfaces.
>>>>>>>>>>
>>>>>>>>>> The result is that I can reach the neighboring nodes just
>>>>>>>>>> fine, but I can't reach the remote node that needs packages
>>>>>>>>>> to be forwarded through the other nodes. For example I can't
>>>>>>>>>> ping from node A to C.
>>>>>>>>>>
>>>>>>>>>> I've tried to disable HW offloading in the driver and then
>>>>>>>>>> everything starts working.
>>>>>>>>>>
>>>>>>>>>> Is this a problem with HW offloading in the KSZ driver, or am
>>>>>>>>>> I missing something essential?      
>>>>>>
>>>>>> Thanks for looking and testing such large scale setup.
>>>>>>     
>>>>>>>>>
>>>>>>>>> How are IP addresses configured? I assume you have a bridge,
>>>>>>>>> LAN1 and LAN2 are members of the bridge, and the IP address is
>>>>>>>>> on the bridge interface?      
>>>>>>>>
>>>>>>>> I have a HSR interface on each node that covers LAN1 and LAN2 as
>>>>>>>> slaves and the IP addresses are on those HSR interfaces. For
>>>>>>>> node A:
>>>>>>>>
>>>>>>>> ip link add name hsr type hsr slave1 lan1 slave2 lan2
>>>>>>>> supervision 45 version 1
>>>>>>>> ip addr add 172.20.1.1/24 dev hsr
>>>>>>>>
>>>>>>>> The other nodes have the addresses 172.20.1.2/24, 172.20.1.3/24
>>>>>>>> and 172.20.1.4/24 respectively.
>>>>>>>>
>>>>>>>> Then on node A, I'm doing:
>>>>>>>>
>>>>>>>> ping 172.20.1.2 # neighboring node B works
>>>>>>>> ping 172.20.1.4 # neighboring node D works
>>>>>>>> ping 172.20.1.3 # remote node C works only if I disable
>>>>>>>> offloading      
>>>>>>>
>>>>>>> BTW, it's enough to disable the offloading of the forwarding for
>>>>>>> HSR frames to make it work.
>>>>>>>
>>>>>>> --- a/drivers/net/dsa/microchip/ksz9477.c
>>>>>>> +++ b/drivers/net/dsa/microchip/ksz9477.c
>>>>>>> @@ -1267,7 +1267,7 @@ int ksz9477_tc_cbs_set_cinc(struct
>>>>>>> ksz_device *dev, int port, u32 val)
>>>>>>>   * Moreover, the NETIF_F_HW_HSR_FWD feature is also enabled, as
>>>>>>> HSR frames
>>>>>>>   * can be forwarded in the switch fabric between HSR ports.
>>>>>>>   */
>>>>>>> -#define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_DUP |
>>>>>>> NETIF_F_HW_HSR_FWD)
>>>>>>> +#define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_DUP)
>>>>>>>
>>>>>>>  void ksz9477_hsr_join(struct dsa_switch *ds, int port, struct
>>>>>>> net_device *hsr)
>>>>>>>  {
>>>>>>> @@ -1279,16 +1279,6 @@ void ksz9477_hsr_join(struct dsa_switch
>>>>>>> *ds, int port, struct net_device *hsr)
>>>>>>>         /* Program which port(s) shall support HSR */
>>>>>>>         ksz_rmw32(dev, REG_HSR_PORT_MAP__4, BIT(port),
>>>>>>> BIT(port));
>>>>>>>
>>>>>>> -       /* Forward frames between HSR ports (i.e. bridge together
>>>>>>> HSR ports) */
>>>>>>> -       if (dev->hsr_ports) {
>>>>>>> -               dsa_hsr_foreach_port(hsr_dp, ds, hsr)
>>>>>>> -                       hsr_ports |= BIT(hsr_dp->index);
>>>>>>> -
>>>>>>> -               hsr_ports |= BIT(dsa_upstream_port(ds, port));
>>>>>>> -               dsa_hsr_foreach_port(hsr_dp, ds, hsr)
>>>>>>> -                       ksz9477_cfg_port_member(dev,
>>>>>>> hsr_dp->index, hsr_ports);
>>>>>>> -       }
>>>>>>> -
>>>>>>>         if (!dev->hsr_ports) {
>>>>>>>                 /* Enable discarding of received HSR frames */
>>>>>>>                 ksz_read8(dev, REG_HSR_ALU_CTRL_0__1, &data);    
>>>>>>
>>>>>> This means that KSZ9477 forwarding is dropping frames when HW
>>>>>> acceleration is used (for non "neighbour" nodes).
>>>>>>
>>>>>> On my setup I only had 2 KSZ9477 devel boards.
>>>>>>
>>>>>> And as you wrote - the SW based one works, so extending
>>>>>> https://elixir.bootlin.com/linux/v6.12-rc2/source/tools/testing/selftests/net/hsr
>>>>>>
>>>>>> would not help in this case.    
>>>>>
>>>>> I see. With two boards you can't test the accelerated forwarding.
>>>>> So how did you test the forwarding at all? Or are you telling me,
>>>>> that this was added to the driver without prior testing (which
>>>>> seems a bit bold and unusual)?  
>>>>
>>>> The packet forwarding is for generating two frames copies on two HSR
>>>> coupled ports on a single KSZ9477:  
>>>
>>> Isn't that what duplication aka NETIF_F_HW_HSR_DUP is for?
>>
>> As I mentioned - the NETIF_F_HW_HSR_DUP is to remove duplicated frames.
>>
>> NETIF_F_HW_HSR_FWD is to in-hw generate frame copy for HSR port to be
>> sent:
>> https://elixir.bootlin.com/linux/v6.13/source/drivers/net/dsa/microchip/ksz9477.c#L1252
> 
> Ok, so you are using a different definition for the "forwarding". What
> puzzles me is the explanation for the HSR feature flags here [1]. They
> seem to suggest the following, which differs from your explanation:
> 
> Forwarding (aka NETIF_F_HW_HSR_FWD):
> 
> "Forwarding involves automatically forwarding between redundant ports in
> an HSR."
> 
> This sounds more like the "forwarding" of a HSR frame within the Ring,
> between two HSR ports, that I was thinking of.
> 
> Duplication (aka NETIF_F_HW_HSR_DUP):
> 
> "Duplication involves the switch automatically sending a single frame
> from the CPU port to both redundant ports."
> 
> This is contradictory to what you wrote above and sounds more
> reasonable. This is what you instead call forwarding above.
> 
> Are you mixing things up, here? Am I completely on the wrong track? I'm
> just trying to understand the basics here.
> 
>>>
>>>>
>>>> https://ww1.microchip.com/downloads/aemDocuments/documents/UNG/ApplicationNotes/ApplicationNotes/AN3474-KSZ9477-High-Availability-Seamless-Redundancy-Application-Note-00003474A.pdf
>>>>
>>>> The KSZ9477 chip also supports RX packet duplication removal, but
>>>> cannot guarantee 100% success (so as a fallback it is done in SW).
>>>>
>>>> The infrastructure from:
>>>> https://elixir.bootlin.com/linux/v6.13/source/tools/testing/selftests/net/hsr/hsr_redbox.sh#L50
>>>>
>>>> is enough to test HW accelerated forwarding (of KSZ9477) from NS1
>>>> and NS2.  
>>>
>>> I'm not really sure if I get it. In this setup NS1 and NS2 are
>>> connected via HSR link (two physical links). On one side packets are
>>> sent duplicated on both physical ports. On the receiving side the
>>> duplication is removed and one packet is forwarded to the CPU.
>>>
>>> Where is forwarding involved here? 
>>
>> In-HW forwarding is when KSZ9477 duplicates frame to be send on second
>> HSR aware port.
>>
>> (only 2 of them can be coupled to have in-hw support for duplication
>> and forwarding. Creating more hsr "interfaces" would just use SW).
>>
>>> Isn't forwarding only for cases
>>> with one intermediate node between the sending and receiving node?
>>
>> This kind of "forwarding" is done in software in the hsr driver.
> 
> But according to the official explanation of the flags [1] this kind of
> forwarding is exactly what NETIF_F_HW_HSR_FWD seems to be about.
> 
>>
>>>
>>>>   
>>>>>
>>>>> Anyway, do you have any suggestions for debugging this?
>>>>> Unfortunately I know almost nothing about this topic. But I can
>>>>> offer to test on my setup, at least for now. I don't know how long
>>>>> I will still have access to the hardware.  
>>>>
>>>> For some reason only frames to neighbours are delivered.
>>>>
>>>> So those are removed at some point (either in KSZ9477 HW or in HSR
>>>> driver itself).
>>>>
>>>> Do you have some dumps from tshark/wireshark to share?
>>>>   
>>>>>
>>>>> If we can't find a proper solution in the long run, I will probably
>>>>> send a patch to disable the accelerated forwarding to at least make
>>>>> HSR work by default.  
>>>>
>>>> As I've noted above - the HW accelerated forwarding is in the
>>>> KSZ9477 chip.  
>>>
>>> Yeah, but if the HW accelerated forwarding doesn't work
>>
>> The "forwarding" in KSZ9477 IC works OK, as frames are duplicated (i.e.
>> forwarded) to both HSR coupled ports.
> 
> No, I don't think duplication and forwarding refer to the same thing
> here. At least it doesn't make sense for me.
> 
>> The problem is with dropping frames travelling in connected KSZ9477
>> devices.
> 
> I'm not really sure.
> 
>>
>>> it would be
>>> better to use no acceleration and have it work in SW at least by
>>> default, right?
>>
>> IMHO, it would be best to fix the code.
> 
> Agreed.
> 
>>>
>>>>
>>>> The code which you uncomment, is following what I've understood from
>>>> the standard (and maybe the bug is somewhere there).  
>>>
>>> Ok, thanks for explaining. I will see if I can find some time to
>>> gather some more information on the problem.
>>
>> That would be very helpful. Thanks in advance for it.
> One more information I just found out: I can leave ksz9477_hsr_join()
> untouched and only remove the feature flags from
> KSZ9477_SUPPORTED_HSR_FEATURES to make things work.
> 
> Broken:
> 
> #define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_DUP |
> NETIF_F_HW_HSR_FWD)
> 
> Works:
> 
> #define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_DUP)
> 
> Works:
> 
> #define KSZ9477_SUPPORTED_HSR_FEATURES (NETIF_F_HW_HSR_FWD)
> 
> Works:
> 
> #define KSZ9477_SUPPORTED_HSR_FEATURES (0)

Please take this with a grain of salt. I've just seen, that the behavior
is not 100% consistent.

I'm pretty sure that things work reliably with the diff I posted before
as I have tested this case more thoroughly.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ