netdev - Re: [PATCH net v3] failover: allow name change on IFF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f90d9dfa-f2f1-ff04-f3fe-88fa0deffdf7@oracle.com>
Date:   Wed, 27 Mar 2019 13:10:10 -0700
From:   si-wei liu <si-wei.liu@...cle.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>,
        Stephen Hemminger <stephen@...workplumber.org>
Cc:     sridhar.samudrala@...el.com, davem@...emloft.net, kubakici@...pl,
        alexander.duyck@...il.com, jiri@...nulli.us,
        netdev@...r.kernel.org, virtualization@...ts.linux-foundation.org,
        liran.alon@...cle.com, boris.ostrovsky@...cle.com,
        vijay.balakrishna@...cle.com
Subject: Re: [PATCH net v3] failover: allow name change on IFF_UP slave
 interfaces



On 3/27/2019 6:25 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 26, 2019 at 07:13:42PM -0700, Stephen Hemminger wrote:
>> On Tue, 26 Mar 2019 19:48:13 -0400
>> Si-Wei Liu <si-wei.liu@...cle.com> wrote:
>>
>>> When a netdev appears through hot plug then gets enslaved by a failover
>>> master that is already up and running, the slave will be opened
>>> right away after getting enslaved. Today there's a race that userspace
>>> (udev) may fail to rename the slave if the kernel (net_failover)
>>> opens the slave earlier than when the userspace rename happens.
>>> Unlike bond or team, the primary slave of failover can't be renamed by
>>> userspace ahead of time, since the kernel initiated auto-enslavement is
>>> unable to, or rather, is never meant to be synchronized with the rename
>>> request from userspace.
>>>
>>> As the failover slave interfaces are not designed to be operated
>>> directly by userspace apps: IP configuration, filter rules with
>>> regard to network traffic passing and etc., should all be done on master
>>> interface. In general, userspace apps only care about the
>>> name of master interface, while slave names are less important as long
>>> as admin users can see reliable names that may carry
>>> other information describing the netdev. For e.g., they can infer that
>>> "ens3nsby" is a standby slave of "ens3", while for a
>>> name like "eth0" they can't tell which master it belongs to.
>>>
>>> Historically the name of IFF_UP interface can't be changed because
>>> there might be admin script or management software that is already
>>> relying on such behavior and assumes that the slave name can't be
>>> changed once UP. But failover is special: with the in-kernel
>>> auto-enslavement mechanism, the userspace expectation for device
>>> enumeration and bring-up order is already broken. Previously initramfs
>>> and various userspace config tools were modified to bypass failover
>>> slaves because of auto-enslavement and duplicate MAC address. Similarly,
>>> in case that users care about seeing reliable slave name, the new type
>>> of failover slaves needs to be taken care of specifically in userspace
>>> anyway.
>>>
>>> It's less risky to lift up the rename restriction on failover slave
>>> which is already UP. Although it's possible this change may potentially
>>> break userspace component (most likely configuration scripts or
>>> management software) that assumes slave name can't be changed while
>>> UP, it's relatively a limited and controllable set among all userspace
>>> components, which can be fixed specifically to listen for the rename
>>> and/or link down/up events on failover slaves. Userspace component
>>> interacting with slaves is expected to be changed to operate on failover
>>> master interface instead, as the failover slave is dynamic in nature
>>> which may come and go at any point.  The goal is to make the role of
>>> failover slaves less relevant, and userspace components should only
>>> deal with failover master in the long run.
>>>
>>> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
>>> Signed-off-by: Si-Wei Liu <si-wei.liu@...cle.com>
>>> Reviewed-by: Liran Alon <liran.alon@...cle.com>
>>
>> Why do you need to do dev_close/dev_open which will bounce
>> the link?
> What we need is notify userspace that link went up/down.
> close/open will do that but just sending notifications
> would do that as well without playing with link states.
>
Since you were requesting to send fake link down/up events around 
rename, so as to keep existing userspace intact with this behavioral 
change, right? The thing is if you can't fake notification with just 
IFF_UP or ~IFF_UP then claim everything is done. If you look at 
rtnl_fill_ifinfo() where the notification payload is prepared, you'll 
find a lot of states and flags are correlated:

ifi_flags
IFLA_OPERSTATE
IFLA_CARRIER
IFLA_CARRIER_CHANGES

which requires below states to be toggled or taken care of in between:

operstate
__LINK_STATE_START
__LINK_STATE_NOCARRIER
carrier_changes

for e.g. user mostly treats IFF_RUNNING as the indication of link 
up/down as opposed to IFF_UP. That would require you to toggle 
__LINK_STATE_START (and  operstate as well) without doing a full 
dev_close/open. Since __LINK_STATE_START is cleared, there's no sense to 
let CARRIER_OK remain set, and then you'd need to take care of 
carrier_changes... Since you don't really shutting down the device, the 
link watchdog keeps running and may race with inconsistent carrier state 
in between. dev_close/open may have done unneeded work, but it's the 
safest option IMHO, as apparently the cost and ugly complexity to fake 
link down/up events is not something worthwhile compared to simply 
bouncing the link state.

Another point is kernel consumers of the NETDEV_CHANGENAME notifier 
might well assume the link is already taken down by dev_close() before 
the rename. I didn't check all those consumers in tree but thought it 
might be safe to keep the current convention.

Now let me turn around and ask you what's your concerns if bouncing the 
link state. While I can tweak a lightweight version of dev_close/open to 
bypass ndo_stop and ndo_start while shutting down the link watchdog on 
behalf of drivers, it's far more involved than make me think if that's 
really what you had in mind.

Another less safer option is that we just notify userspace anyway 
without sending down/up event around, as I don't see *any real 
application* cares about the link state or whatsoever when it attempts 
to detect rename. Given that the scope is limited to failover slave the 
chance of breaking userspace app would be extremely low in practice.

Thanks,
-Siwei