[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGL4nSOwAMW_yQ129J_jeukQffv=xQfhpA9mnKsmvJe+FvQa3w@mail.gmail.com>
Date: Sun, 12 Nov 2023 21:52:07 +0100
From: Kristian Myrland Overskeid <koverskeid@...il.com>
To: Heiko Gerstung <heiko.gerstung@...nberg.de>
Cc: Andrew Lunn <andrew@...n.ch>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: PRP with VLAN support - or how to contribute to a Linux network driver
Hi Heiko,
> thanks a lot for your respsonse - we tried removing the NETIF_F_VLAN_CHALLENGED flag and it did not work for us. We could set up a VLAN interface on top of the PRP interface, but traffic did not get through. I will retest this to make sure we did not overlook something.
It worked for me on Ubuntu 22.04.03 LTS, but I haven't tried it on
different distros. You can use tcpdump to check if the vlan frames
reach the prp interface. If not, it's probably a vlan configuration
issue.
One thing you should be aware of is that unless you're testing on the
vanilla kernel, you should compare the source code of the hsr module
with the vanilla kernels. For example, the hsr module on Ubuntu is far
behind the vanilla kernel and I needed to add changes manually to get
rid of some bugs(not related to vlan though). If you are using a
distro with an even more outdated hsr module, this could be the reason
why your tests are failing with the NETIF_F_VLAN_CHALLENGED flag
removed.
> If I understand correctly, this would make the discard process more robust because in the access port scenario the frames can arrive in an even more mixed up order or do you mean that the access port is removing the VLAN tag and sends the frames untagged to the node?
I see that I could have explained myself better here. I meant that the
access port is removing the VLAN tag and sends the frames untagged to
the node. In this case you cannot differ between the different vlans,
which means that you have to keep track of all sequence numbers that
should be dropped to avoid that legit frames arriving in a different
order is dropped. I wrote that the vlan id must be stored, but this is
not necessary since the source nodes don't consider vlan ids when
setting the sequence number for the outgoing frames.
Kristian
fre. 10. nov. 2023 kl. 09:24 skrev Heiko Gerstung <heiko.gerstung@...nberg.de>:
>
>
>
> Am 09.11.23, 13:20 schrieb "Kristian Myrland Overskeid" <koverskeid@...il.com <mailto:koverskeid@...il.com>>:
>
>
> Hi Kristian,
>
> > If you simply remove the line "dev->features |=
> > NETIF_F_VLAN_CHALLENGED;" in hsr_device.c, the hsr-module is handling
> > vlan frames without any further modifications. Unless you need to send
> > vlan tagged supervision frames, I'm pretty sure the current
> > implementation works just as fine with vlan as without.
>
> thanks a lot for your respsonse - we tried removing the NETIF_F_VLAN_CHALLENGED flag and it did not work for us. We could set up a VLAN interface on top of the PRP interface, but traffic did not get through. I will retest this to make sure we did not overlook something.
>
> > However, in my opinion, the discard-algorithm
> > (hsr_register_frame_out() in hsr_framereg.c) is not made for switched
> > networks. The problem with the current implementation is that it does
> > not account for frames arriving in a different order than it was sent
> > from a host. It simply checks if the sequence number of an arriving
> > frame is higher than the previous one. If the network has some sort of
> > priority, it must be expected that frames will arrive out of order
> > when the network load is big enough for the switches to start
> > prioritizing.
> >
> > My solution was to add a linked list to the node struct, one for each
> > registered vlan id. It contains the vlan id, last sequence number and
> > time. On reception of a vlan frame to the HSR_PT_MASTER, it retrieves
> > the "node_seq_out" and "node_time_out" based on the vlan.
>
> I agree that it would be necessary to handle frames arriving in a mixed up order.
>
> > This works fine for me because all the prp nodes are connected to
> > trunk ports and the switches are prioritizing frames based on the vlan
> > tag.
>
> > If a prp node is connected to an access port, but the network is using
> > vlan priority, all sequence numbers and timestamps with the
> > corresponding vlan id must be kept in a hashed list. The list must be
> > regularly checked to remove elements before new frames with a wrapped
> > around sequence number can arrive.
>
> If I understand correctly, this would make the discard process more robust because in the access port scenario the frames can arrive in an even more mixed up order or do you mean that the access port is removing the VLAN tag and sends the frames untagged to the node?
>
> > ZHAW School of Engineering has made a prp program for both linux user
> > and kernel space with such a discard algorithm. The program does not
> > compile without some modifications, but the discard algorithm works
> > fine. The program is open source and can be found at
> > https://github.com/ZHAW-InES-Team/sw_stack_prp1 <https://github.com/ZHAW-InES-Team/sw_stack_prp1>.
>
>
> I will reach out to ZHAW and check with them if they would be willing to implement their more robust discard mechanism into the hsr module. The github repo has a note saying it moved to github.zhaw.ch which I cannot access as it requires credentials.
>
> Thanks again,
>
> Heiko
>
>
>
>
>
> tor. 9. nov. 2023 kl. 09:08 skrev Heiko Gerstung <heiko.gerstung@...nberg.de <mailto:heiko.gerstung@...nberg.de>>:
> >
> > Am 08.11.23, 16:17 schrieb "Andrew Lunn" <andrew@...n.ch <mailto:andrew@...n.ch> <mailto:andrew@...n.ch <mailto:andrew@...n.ch>>>:
> >
> >
> > >> I would like to discuss if it makes sense to remove the PRP
> > >> functionality from the HSR driver (which is based on the bridge
> > >> kernel module AFAICS) and instead implement PRP as a separate module
> > >> (based on the Bonding driver, which would make more sense for PRP).
> >
> >
> > > Seems like nobody replied. I don't know PRP or HSR, so i can only make
> > > general remarks.
> >
> > Thank you for responding!
> >
> > > The general policy is that we don't rip something out and replace it
> > > with new code. We try to improve what already exists to meet the
> > > demands. This is partially because of backwards compatibility. There
> > > could be users using the code as is. You cannot break that. Can you
> > > step by step modify the current code to make use of bonding, and in
> > > the process show you don't break the current use cases?
> >
> > Understood. I am not sure if we can change the hsr driver to gradually use a more bonding-like approach for prp and I believe this might not be required, as long as we can get VLAN support into it.
> >
> > > You also need to consider offloading to hardware. The bridge code has infrastructure
> > > to offload. Does the bond driver? I've no idea about that.
> >
> > I do not know this either but would expect that the nature of bonding would not require offloading support (I do not see a potential for efficiency/performance improvements here, unlike HSR or PRP).
> >
> > >> Hoping for advise what the next steps could be. Happy to discuss
> > >> this off-list as it may not be of interest for most people.
> >
> > > You probably want to get together with others who are interested in
> > > PRP and HSR. linutronix, ti, microchip, etc.
> >
> > Yes, would love to do that and my hope was that I would find them here. I am not familiar with the "orphaned" status for a kernel module, but I would have expected that one of the mentioned parties interested in PRP/HSR would have adopted the module.
> >
> > > Andrew
> >
> > Again, thanks a lot for your comments and remarks, very useful.
> >
> > Heiko
> >
> >
> >
>
>
>
>
>
Powered by blists - more mailing lists