netdev - Re: Bond recovery from BOND_LINK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0b5b1e0-3e62-8526-3628-1ed2afecacc1@hpe.com>
Date:   Thu, 2 Nov 2017 08:47:25 -0400
From:   Alex Sidorenko <alexandre.sidorenko@....com>
To:     Jay Vosburgh <jay.vosburgh@...onical.com>,
        Jarod Wilson <jarod@...hat.com>
Cc:     netdev@...r.kernel.org
Subject: Re: Bond recovery from BOND_LINK_FAIL state not working



On 11/02/2017 12:51 AM, Jay Vosburgh wrote:
> Jarod Wilson <jarod@...hat.com> wrote:
> 
>> On 2017-11-01 8:35 PM, Jay Vosburgh wrote:
>>> Jay Vosburgh <jay.vosburgh@...onical.com> wrote:
>>>
>>>> Alex Sidorenko <alexandre.sidorenko@....com> wrote:
>>>>
>>>>> The problem has been found while trying to deploy RHEL7 on HPE Synergy
>>>>> platform, it is seen both in customer's environment and in HPE test lab.
>>>>>
>>>>> There are several bonds configured in TLB mode and miimon=100, all other
>>>>> options are default. Slaves are connected to VirtualConnect
>>>>> modules. Rebooting a VC module should bring one bond slave (ens3f0) down
>>>>> temporarily, but not another one (ens3f1). But what we see is
>>>>>
>>>>> Oct 24 10:37:12 SYDC1LNX kernel: bond0: link status up again after 0 ms for interface ens3f1
>>>
>>> 	In net-next, I don't see a path in the code that will lead to
>>> this message, as it would apparently require entering
>>> bond_miimon_inspect in state BOND_LINK_FAIL but with downdelay set to 0.
>>> If downdelay is 0, the code will transition to BOND_LINK_DOWN and not
>>> remain in _FAIL state.
>>
>> The kernel in question is laden with a fair bit of additional debug spew,
>> as we were going back and forth, trying to isolate where things were going
>> wrong.  That was indeed from the BOND_LINK_FAIL state in
>> bond_miimon_inspect, inside the if (link_state) clause though, so after
>> commit++, there's a continue, which ... does what now? Doesn't it take us
>> back to the top of the bond_for_each_slave_rcu() loop, so we bypass the
>> next few lines of code that would have led to a transition to
>> BOND_LINK_DOWN?
> 
> 	Just to confirm: your downdelay is 0, correct?

Correct.

> 
> 	And, do you get any other log messages other than "link status
> up again after 0 ms"?

Yes, here are some messages (from an early instrumentation):

Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_inspect returned non-zero
Oct 24 10:56:27 SYDC1LNX kernel: bond0: committing link state for interface ens3f0, 20000 Mbps full 
duplex
Oct 24 10:56:27 SYDC1LNX kernel: bond0: slave->should_notify_link for interface ens3f0 now: 1
Oct 24 10:56:27 SYDC1LNX kernel: bond0: committing link state for interface ens3f1, 20000 Mbps full 
duplex
Oct 24 10:56:27 SYDC1LNX kernel: bond0: slave->should_notify_link for interface ens3f1 now: 1
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_commit: working on slave ens3f0
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_commit: BOND_LINK_DOWN
Oct 24 10:56:27 SYDC1LNX kernel: bond0: link status definitely down for interface ens3f0, disabling it
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_select_active_slave: best_slave != curr_active_slave
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_select_active_slave: best_slave is NULL, this is 
probably bad
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_change_active_slave: old_active: ens3f0
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_change_active_slave: new_active: NULL
Oct 24 10:56:27 SYDC1LNX kernel: bond0: Removing MAC from old_active
Oct 24 10:56:27 SYDC1LNX kernel: bond0: ALB and TLB modes should call bond_alb_handle_active_change
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_set_carrier: turning carrier off
Oct 24 10:56:27 SYDC1LNX kernel: bond0: now running without any active interface!
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_commit: working on slave ens3f1
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_commit: BOND_LINK_NOCHANGE on slave ens3f1
Oct 24 10:56:27 SYDC1LNX kernel: bond0: link status up again after 0 ms for interface ens3f1
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_inspect returned non-zero
Oct 24 10:56:27 SYDC1LNX kernel: bond0: committing link state for interface ens3f0, 20000 Mbps full 
duplex
Oct 24 10:56:27 SYDC1LNX kernel: bond0: slave->should_notify_link for interface ens3f0 now: 0
Oct 24 10:56:27 SYDC1LNX kernel: bond0: committing link state for interface ens3f1, 20000 Mbps full 
duplex
Oct 24 10:56:27 SYDC1LNX kernel: bond0: slave->should_notify_link for interface ens3f1 now: 0
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_commit: working on slave ens3f0
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_commit: BOND_LINK_NOCHANGE on slave ens3f0
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_commit: working on slave ens3f1
Oct 24 10:56:27 SYDC1LNX kernel: bond0: bond_miimon_commit: BOND_LINK_NOCHANGE on slave ens3f1


That is, we never see ens3f1 going to BOND_LINK_DOWN and it continues staying in 
BOND_LINK_NOCHANGE/BOND_LINK_FAIL


> 
> 	To answer your question, yes, the "if (link_state) {" block in
> the BOND_LINK_FAIL case of bond_miimon_inspect ends in continue, but
> this path is nominally for the downdelay logic.  If downdelay is active
> and the link recovers before the delay expires, the link should never
> have moved to BOND_LINK_DOWN.  The commit++ causes bond_miimon_inspect
> to return nonzero, causing in turn the bond_propose_link_state change to
> BOND_LINK_FAIL state to be committed.  This path deliberately does not
> set slave->new_link, as downdelay is purposely delaying the transition
> to BOND_LINK_DOWN.
> 
> 	If downdelay is 0, the slave->link should not persist in
> BOND_LINK_FAIL state; it should set new_link = BOND_LINK_DOWN which will
> cause a transition in bond_miimon_commit.  The bond_propose_link_state
> call to set BOND_LINK_FAIL in the BOND_LINK_UP case will be committed in
> bond_mii_monitor prior to calling bond_miimon_commit, which will in turn
> do the transition to _DOWN state.  In this case, the BOND_LINK_FAIL case
> "if (link_state) {" block should never be entered.

I totally agree with your description of transition logic, and this is why
we were puzzled by how this can occur until we noticed NetworkManager
messages around this time and decided to run a test without it.
Without NM, everything works as expected. After that, adding more
instrumentation, we have found that we do not propose BOND_LINK_FAIL inside
bond_miimon_inspect() but elsewhere (NetworkManager?).

> 
> 	I'm a little leery of adding the state transition you suggest
> without understanding how this situation arose, as it shouldn't get into
> this condition in the first place.

I think that adding such transition could be a reasonable safety measure
and should help in diagnosing problems. It can be done only when
bond->params.downdelay==0, adding pr_warn_once() with text suggesting that
this is unexpected.

Alex

> 
> 	-J
> 
>> ...
>>>> 	Your patch does not apply to net-next, so I'm not absolutely
>>>> sure where this is, but presuming that this is in the BOND_LINK_FAIL
>>>> case of the switch, it looks like both BOND_LINK_FAIL and BOND_LINK_BACK
>>>> will have the issue that if the link recovers or fails, respectively,
>>>> within the delay window (for down/updelay > 0) it won't set a
>>>> slave->new_link.
>>>>
>>>> 	Looks like this got lost somewhere along the line, as originally
>>>> the transition back to UP (or DOWN) happened immediately, and that has
>>>> been lost somewhere.
>>>>
>>>> 	I'll have to dig out when that broke, but I'll see about a test
>>>> patch this afternoon.
>>>
>>> 	The case I was concerned with was moved around; the proposed
>>> state is committed in bond_mii_monitor.  But to commit to _FAIL state,
>>> the downdelay would have to be > 0.  I'm not seeing any errors in
>>> net-next; can you reproduce your erroneous behavior on net-next?
>>
>> I can try to get a net-next-ish kernel into their hands, but the bonding
>> driver we're working with here is quite close to current net-next already,
>> so I'm fairly confident the same thing will happen.
>>
>> -- 
>> Jarod Wilson
>> jarod@...hat.com
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@...onical.com
> 

-- 
------------------------------------------------------------------
Alex Sidorenko	email: asid@....com
ERT  Linux 	Hewlett-Packard Enterprise (Canada)
------------------------------------------------------------------