[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACcJQnQyW7GWp42AuRBwKpzVRAGnXoB7ddtcURW5GxDa_4wKKQ@mail.gmail.com>
Date: Wed, 29 Apr 2015 16:25:07 -0700
From: Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: "David S. Miller" <davem@...emloft.net>,
Scott Feldman <sfeldma@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
Andy Gospodarek <gospo@...ulusnetworks.com>,
Wilson Kok <wkok@...ulusnetworks.com>
Subject: Re: [RFC PATCH net-next v3 1/4] net core: Add IFF_PROTO_DOWN support.
On Wed, Apr 29, 2015 at 4:07 PM, Anuradha Karuppiah
<anuradhak@...ulusnetworks.com> wrote:
> On Wed, Apr 29, 2015 at 3:13 PM, Stephen Hemminger
> <stephen@...workplumber.org> wrote:
>> On Mon, 27 Apr 2015 10:38:21 -0700
>> anuradhak@...ulusnetworks.com wrote:
>>
>>> From: Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
>>>
>>> This patch introduces an IFF_PROTO_DOWN flag that can be used by
>>> user space applications to notify drivers that errors have been
>>> detected on the device.
>>>
>>> Signed-off-by: Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
>>> Signed-off-by: Andy Gospodarek <gospo@...ulusnetworks.com>
>>> Signed-off-by: Roopa Prabhu <roopa@...ulusnetworks.com>
>>> Signed-off-by: Wilson Kok <wkok@...ulusnetworks.com>
>>
>> I worry that adding another bit to an already complex state API
>> will break userspace.
>>
>> There are lots of things besides iproute2 which look at those
>> flags including routing daemons (quagga), network manager, netplugd,
>> and switch controllers.
>
> Yes, I understand your concerns here. And tried to work around introducing
> a separate error flag by clearing IFF_UP on proto_down/detecting errors (as
> Scott also brought up earlier).
>
> That implementation failed because of the following reasons -
> 1. There is no way to disambiguate between admin_down (!IFF_UP) and an
> APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or
> automation-scripts that monitor the config assumed that switch-port
> configuration had somehow fallen out of sync (and attempted to reinstate the
> admin_up repeatedly).
>
> 2. Automatic error recovery was not possible; consider the following scenario
> for e.g.
> a. The MLAG peer-link is down so the MLAG app on the secondary switch has
> proto_down’ed all the MLAG ports (including switch-port swp1) by clearing
> IFF_UP.
> b. At the same time the administrator is in the process of making some
> changes on the network connected to swp1. To avoid doing it live he would
> admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this
> is a no-op as event #a has already cleared IFF_UP on swp1).
> c. If the MLAG peer-link recovers at this point the MLAG app on the
> secondary switch would try to automatically recover the MLAG ports
> by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing
> that overrides the administrator’s directive to keep swp1 admin_down.
> Overriding an admin-down in a live network can be very dangerous so it
> is not possible to do auto-error-recovery unless we have a way to
> disambiguate between the admin and error states.
I have the need to disambiguate the error state but it doesn't have to be an
IFF_X attribute. Stephen, Do you think it would be more easily consumable if
it were a new/separate net_device attribute instead of being a new bit in
"&struct net_device flags"?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists