netdev - Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 28 Apr 2015 08:43:51 -0700
From:	Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
To:	unlisted-recipients:; (no To-header on input)
Cc:	Netdev <netdev@...r.kernel.org>
Subject: Re: [RFC PATCH net-next v3 0/4] net: Introduce IFF_PROTO_DOWN flag.

On Mon, Apr 27, 2015 at 10:45 PM, Scott Feldman <sfeldma@...il.com> wrote:
> On Mon, Apr 27, 2015 at 10:38 AM,  <anuradhak@...ulusnetworks.com> wrote:
>> From: Anuradha Karuppiah <anuradhak@...ulusnetworks.com>
>>
>> User space daemons can detect errors in the network that need to be
>> notified to the switch device drivers.
>>
>> Drivers can react to this error state by doing a phy-down on the
>> switch-port which would result in a carrier-off locally and on the
>> directly connected switch. Doing that would prevent loops and
>> black-holes in the network.
>
> (Sorry if this was asked earlier)
>
> Can the application simply send a SETLINK with IFF_UP clear and the
> port driver's ndo_stop would bring the PHY link down?

(Re-sending as plain text) -

Yes, Clearing IFF_UP on detecting errors (PROTO_DOWN) is possible and we tried
that implementation as well. Unfortunately it failed because of the following
reasons -

1. There is no way to disambiguate between admin_down (!IFF_UP) and an
APP/driver enforced error_down (IFF_PROTO_DOWN). Administrator or
automatation-scripts that monitor the config assumed that switch-port
configuration had somehow fallen out of sync (and attempted to reinstate the
admin_up repeatedly).

2. Automatic error recovery was not possible; consider the following scenario
for e.g.
   a. The MLAG peer-link is down so the MLAG app on the secondary switch has
      proto_down’ed all the MLAG ports (including switch-port swp1) by clearing
      IFF_UP.
   b. At the same time the administrator is in the process of making some
      changes on the network connected to swp1. To avoid doing it live he would
      admin_disable swp1 (!IFF_UP) by doing an "ip link set swp1 down" (this
      is a no-op as event #a has already cleared IFF_UP on swp1).
   c. If the MLAG peer-link recovers at this point the MLAG app on the
      secondary switch would try to automatically recover the MLAG ports
      by clearing proto_down (i.e. setting IFF_UP); including on swp1. Doing
      that overrides the administrator’s directive to keep swp1 admin_down.
      Overriding an admin-down in a live network can be very dangerous so it
      is not possible to do auto-error-recovery unless we have a way to
      disambiguate between the admin and error states.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html