[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAE4R7bAbe34eOqfdEUqNAp57936Eqcu54+r_Bp75CDdB4FgThw@mail.gmail.com>
Date: Tue, 28 Apr 2015 12:37:46 -0700
From: Scott Feldman <sfeldma@...il.com>
To: Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
Andy Gospodarek <gospo@...ulusnetworks.com>,
Wilson Kok <wkok@...ulusnetworks.com>
Subject: Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.
On Tue, Apr 28, 2015 at 8:39 AM, Anuradha Karuppiah
<anuradhak@...ulusnetworks.com> wrote:
>
>
> On Mon, Apr 27, 2015 at 10:45 PM, Scott Feldman <sfeldma@...il.com> wrote:
>>
>> On Mon, Apr 27, 2015 at 10:38 AM, <anuradhak@...ulusnetworks.com> wrote:
>> > From: Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
>> >
>> > User space daemons can detect errors in the network that need to be
>> > notified to the switch device drivers.
>> >
>> > Drivers can react to this error state by doing a phy-down on the
>> > switch-port which would result in a carrier-off locally and on the
>> > directly connected switch. Doing that would prevent loops and
>> > black-holes in the network.
>>
>> (Sorry if this was asked earlier)
>>
>> Can the application simply send a SETLINK with IFF_UP clear and the
>> port driver's ndo_stop would bring the PHY link down?
>
>
> Yes, Clearing IFF_UP on detecting errors (PROTO_DOWN) is possible and we
> tried
> that implementation as well. Unfortunately it failed because of the
> following
> reasons -
>
> 1. There is no way to disambiguate between admin_down (!IFF_UP) and an
> APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or
> automation-scripts that monitor the config assumed that switch-port
> configuration had somehow fallen out of sync (and attempted to reinstate the
> admin_up repeatedly).
>
> 2. Automatic error recovery was not possible; consider the following
> scenario
> for e.g.
> a. The MLAG peer-link is down so the MLAG app on the secondary switch has
> proto_down’ed all the MLAG ports (including switch-port swp1) by
> clearing
> IFF_UP.
> b. At the same time the administrator is in the process of making some
> changes on the network connected to swp1. To avoid doing it live he
> would
> admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this
> is a no-op as event #a has already cleared IFF_UP on swp1).
> c. If the MLAG peer-link recovers at this point the MLAG app on the
> secondary switch would try to automatically recover the MLAG ports
> by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing
> that overrides the administrator’s directive to keep swp1 admin_down.
> Overriding an admin-down in a live network can be very dangerous so it
> is not possible to do auto-error-recovery unless we have a way to
> disambiguate between the admin and error states
That makes sense.
Dang, this is so close to IFF_DORMANT. The interface can be IFF_UP
and link mode can be DORMANT. Can the port driver kill PHY link if
dev->flags&IFF_DORMANT in ndo_set_rx_mode()? Would require
IFF_DORMANT is included in dev->flags in __dev_change_flags().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists