[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACcJQnQd0WuDc_PGLwCem-HVni9bmObky3qhOBv+2Uc4mSLtsA@mail.gmail.com>
Date: Fri, 20 Mar 2015 13:23:55 -0700
From: Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: "David S. Miller" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
Andy Gospodarek <gospo@...ulusnetworks.com>,
Wilson Kok <wkok@...ulusnetworks.com>
Subject: Re: [PATCH net-next 0/3] net: introduce IFF_PROTO_DOWN flag.
On Fri, Mar 20, 2015 at 11:50 AM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Fri, Mar 20, 2015 at 9:45 AM, Anuradha Karuppiah
> <anuradhak@...ulusnetworks.com> wrote:
>> On Fri, Mar 20, 2015 at 9:13 AM, Alexei Starovoitov
>> <alexei.starovoitov@...il.com> wrote:
>>> On Fri, Mar 20, 2015 at 8:11 AM, <anuradhak@...ulusnetworks.com> wrote:
>>>> From: Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
>>>>
>>>> Applications can detect errors in the network that would require
>>>> disabling the device independent of the admin state. In the presence of
>>>> these errors traffic could be black holed or looped resulting in a
>>>> network meltdown. Clearing the IFF_UP flag for error disabling the
>>>> device can be problematic because -
>>>>
>>>> 1. The administrator cannot distinguish between a user space daemon’s
>>>> error-disable and a regular device disable.
>>>> 2. Applications can monitor the error state and enable the device once
>>>> the error is removed. If IFF_UP is used for this purpose the application
>>>> may end up enabling a device that the administrator has intentionally
>>>> disabled for other reasons. This could result in network changes not
>>>> expected by the admin.
>>>>
>>>
>>> Both reasons look like workaround for user space issues.
>>> Just keep this fake-down state in userspace.
>>> What's the point pushing it to kernel?
>>
>> Applications can deal with IFF_UP being cleared and they can certainly
>> clear IFF_UP as well on detecting errors. However an application
>> cannot know the reason for the !IFF_UP notification. So if an
>> application detected a device error being cleared it would have to
>> unconditionally enable the device as a part of recovery handling
>> thereby ignoring the administrator’s request to keep the device
>> disabled. Separating error-disable (IFF_PROTO_DOWN) from admin-disable
>> (!IFF_UP) lets the administrator have a say in keeping a device
>> disabled.
>>
>>> looking at 3rd patch:
>>> + * @IF_LINK_PROTO_DOWN_MLAG: proto_down by a multi-chassis LAG application.
>>> + * @IF_LINK_PROTO_DOWN_STP: proto_down by an STP application.
>>>
>>> so there will be new flag for every application that cannot deal with
>>> normal down?
>>
>> These applications can clear the error state independent of each
>> other. Say for e.g. both STP-BPDU guard and MLAG error-disabled a
>> device. When the MLAG split-brain error is resolved the MLAG
>> application could clear IFF_PROTO_DOWN but the BPDU guard error would
>> still exist. This will create problem windows that could aggressively
>> affect the network.
>>
>
> if I understand this correctly you have implementation of
> stp-bpdu guard in user space instead of bridge stp core and
> that is causing these issues. If you move this feature into
> the kernel you won't have to add this special down state, right?
IFF_PROTO_DOWN is needed to distinguish between admin and error
disable states. Even if a kernel driver was setting or clearing the
error-disable state via dev_close/open it could still end up
overriding the administrator’s need to keep a device DOWN as a part of
its error recovery handling. To avoid that problem the kernel STP
BPDU-guard could also use dev_set_proto_down(…,IF_LINK_PROTO_DOWN_STP)
to disable misbehaving access ports.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists