[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200127110418.f7443ecls6ih2fwt@lx-anielsen.microsemi.net>
Date: Mon, 27 Jan 2020 12:04:18 +0100
From: "Allan W. Nielsen" <allan.nielsen@...rochip.com>
To: Andrew Lunn <andrew@...n.ch>
CC: Horatiu Vultur <horatiu.vultur@...rochip.com>,
<linux-kernel@...r.kernel.org>, <netdev@...r.kernel.org>,
<bridge@...ts.linux-foundation.org>, <jiri@...nulli.us>,
<ivecera@...hat.com>, <davem@...emloft.net>,
<roopa@...ulusnetworks.com>, <nikolay@...ulusnetworks.com>,
<anirudh.venkataramanan@...el.com>, <olteanv@...il.com>,
<jeffrey.t.kirsher@...el.com>, <UNGLinuxDriver@...rochip.com>
Subject: Re: [RFC net-next v3 06/10] net: bridge: mrp: switchdev: Extend
switchdev API to offload MRP
On 26.01.2020 16:59, Andrew Lunn wrote:
>EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>On Sun, Jan 26, 2020 at 02:22:13PM +0100, Horatiu Vultur wrote:
>> The 01/25/2020 17:35, Andrew Lunn wrote:
>> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>> >
>> > > SWITCHDEV_OBJ_ID_RING_TEST_MRP: This is used when to start/stop sending
>> > > MRP_Test frames on the mrp ring ports. This is called only on nodes that have
>> > > the role Media Redundancy Manager.
>> >
>> > How do you handle the 'headless chicken' scenario? User space tells
>> > the port to start sending MRP_Test frames. It then dies. The hardware
>> > continues sending these messages, and the neighbours thinks everything
>> > is O.K, but in reality the state machine is dead, and when the ring
>> > breaks, the daemon is not there to fix it?
I agree, we need to find a solution to this issue.
>> > And it is not just the daemon that could die. The kernel could opps or
>> > deadlock, etc.
>> >
>> > For a robust design, it seems like SWITCHDEV_OBJ_ID_RING_TEST_MRP
>> > should mean: start sending MRP_Test frames for the next X seconds, and
>> > then stop. And the request is repeated every X-1 seconds.
Sounds like a good idea to me.
>> I totally missed this case, I will update this as you suggest.
>
>What does your hardware actually provide?
>
>Given the design of the protocol, if the hardware decides the OS etc
>is dead, it should stop sending MRP_TEST frames and unblock the ports.
>If then becomes a 'dumb switch', and for a short time there will be a
>broadcast storm. Hopefully one of the other nodes will then take over
>the role and block a port.
As far as I know, the only feature HW has to prevent this is a
watch-dog timer. Which will reset the entire system (not a bad idea if
the kernel has dead-locked).
/Allan
Powered by blists - more mailing lists