lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15507.1569472734@nyx>
Date:   Wed, 25 Sep 2019 21:38:54 -0700
From:   Jay Vosburgh <jay.vosburgh@...onical.com>
To:     Aleksei Zakharov <zaharov@...ectel.ru>
cc:     netdev@...r.kernel.org, "zhangsha (A)" <zhangsha.zhang@...wei.com>
Subject: Re: Fwd: [PATCH] bonding/802.3ad: fix slave initialization states race

Aleksei Zakharov <zaharov@...ectel.ru> wrote:

>ср, 25 сент. 2019 г. в 03:31, Jay Vosburgh <jay.vosburgh@...onical.com>:
>>
>> Алексей Захаров wrote:
>> [...]
>> >Right after reboot one of the slaves hangs with actor port state 71
>> >and partner port state 1.
>> >It doesn't send lacpdu and seems to be broken.
>> >Setting link down and up again fixes slave state.
>> [...]
>>
>>         I think I see what failed in the first patch, could you test the
>> following patch?  This one is for net-next, so you'd need to again swap
>> slave_err / netdev_err for the Ubuntu 4.15 kernel.
>>
>I've tested new patch. It seems to work. I can't reproduce the bug
>with this patch.
>There are two types of messages when link becomes up:
>First:
>bond-san: EVENT 1 llu 4294895911 slave eth2
>8021q: adding VLAN 0 to HW filter on device eth2
>bond-san: link status definitely down for interface eth2, disabling it
>mlx4_en: eth2: Link Up
>bond-san: EVENT 4 llu 4294895911 slave eth2
>bond-san: link status up for interface eth2, enabling it in 500 ms
>bond-san: invalid new link 3 on slave eth2
>bond-san: link status definitely up for interface eth2, 10000 Mbps full duplex
>Second:
>bond-san: EVENT 1 llu 4295147594 slave eth2
>8021q: adding VLAN 0 to HW filter on device eth2
>mlx4_en: eth2: Link Up
>bond-san: EVENT 4 llu 4295147594 slave eth2
>bond-san: link status up again after 0 ms for interface eth2
>bond-san: link status definitely up for interface eth2, 10000 Mbps full duplex
>
>These messages (especially "invalid new link") look a bit unclear from
>sysadmin point of view.

	The "invalid new link" is appearing because bond_miimon_commit
is being asked to commit a new state that isn't UP or DOWN (3 is
BOND_LINK_BACK).  I looked through the patched code today, and I don't
see a way to get to that message with the new link set to 3, so I'll add
some instrumentation and send out another patch to figure out what's
going on, as that shouldn't happen.

	I don't see the "invalid" message testing locally, I think
because my network device doesn't transition to carrier up as quickly as
yours.  I thought you were getting BOND_LINK_BACK passed through from
bond_enslave (which calls bond_set_slave_link_state, which will set
link_new_link to BOND_LINK_BACK and leave it there), but the
link_new_link is reset first thing in bond_miimon_inspect, so I'm not
sure how it gets into bond_miimon_commit (I'm thinking perhaps a
concurrent commit triggered by another slave, which then picks up this
proposed link state change by happenstance).

	-J

---
	-Jay Vosburgh, jay.vosburgh@...onical.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ