[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <879c5144-183f-51d3-21e3-51c20d1d02b4@televic.com>
Date: Mon, 27 Jan 2020 15:41:13 +0100
From: Jürgen Lambrecht <j.lambrecht@...evic.com>
To: "Allan W. Nielsen" <allan.nielsen@...rochip.com>,
Andrew Lunn <andrew@...n.ch>
Cc: Horatiu Vultur <horatiu.vultur@...rochip.com>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
bridge@...ts.linux-foundation.org, jiri@...nulli.us,
ivecera@...hat.com, davem@...emloft.net, roopa@...ulusnetworks.com,
nikolay@...ulusnetworks.com, anirudh.venkataramanan@...el.com,
olteanv@...il.com, jeffrey.t.kirsher@...el.com,
UNGLinuxDriver@...rochip.com
Subject: Re: [RFC net-next v3 06/10] net: bridge: mrp: switchdev: Extend
switchdev API to offload MRP
On 1/27/20 12:04 PM, Allan W. Nielsen wrote:
> CAUTION: This Email originated from outside Televic. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> On 26.01.2020 16:59, Andrew Lunn wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> On Sun, Jan 26, 2020 at 02:22:13PM +0100, Horatiu Vultur wrote:
>>> The 01/25/2020 17:35, Andrew Lunn wrote:
>>> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>> >
>>> > > SWITCHDEV_OBJ_ID_RING_TEST_MRP: This is used when to start/stop sending
>>> > > MRP_Test frames on the mrp ring ports. This is called only on nodes that have
>>> > > the role Media Redundancy Manager.
>>> >
>>> > How do you handle the 'headless chicken' scenario? User space tells
>>> > the port to start sending MRP_Test frames. It then dies. The hardware
>>> > continues sending these messages, and the neighbours thinks everything
>>> > is O.K, but in reality the state machine is dead, and when the ring
>>> > breaks, the daemon is not there to fix it?
> I agree, we need to find a solution to this issue.
>
>>> > And it is not just the daemon that could die. The kernel could opps or
>>> > deadlock, etc.
>>> >
>>> > For a robust design, it seems like SWITCHDEV_OBJ_ID_RING_TEST_MRP
>>> > should mean: start sending MRP_Test frames for the next X seconds, and
>>> > then stop. And the request is repeated every X-1 seconds.
> Sounds like a good idea to me.
Indeed, and it should then do the same as mentioned below and "... come a 'dumb switch' ", except that I propose to make it configurable how to fallback: with auto-recovery ('dumb switch') or safe mode that keeps the ports blocked, and then some higher layer protocol should fix it.
>
>>> I totally missed this case, I will update this as you suggest.
>>
>> What does your hardware actually provide?
>>
>> Given the design of the protocol, if the hardware decides the OS etc
>> is dead, it should stop sending MRP_TEST frames and unblock the ports.
>> If then becomes a 'dumb switch', and for a short time there will be a
>> broadcast storm. Hopefully one of the other nodes will then take over
>> the role and block a port.
> As far as I know, the only feature HW has to prevent this is a
> watch-dog timer. Which will reset the entire system (not a bad idea if
> the kernel has dead-locked).
Indeed. Our designs always have a watchdog.
And then I again propose to have 2 bootup options.
I refer here also to my answer on Allan's answer on my email of 12:29PM.
Kind regards,
Jürgen
>
> /Allan
>
Powered by blists - more mailing lists