lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADps9UA7S2betC-+5sOXjYWmDC8gnk6JJ9aEueEOoYAC3DU8hg@mail.gmail.com>
Date:   Thu, 10 Aug 2017 19:52:03 +0200
From:   Andreas Born <futur.andy@...glemail.com>
To:     Arend van Spriel <arend.vanspriel@...adcom.com>
Cc:     Kalle Valo <kvalo@...eaurora.org>,
        Mahesh Bandewar <maheshb@...gle.com>,
        Andy Gospodarek <andy@...yhouse.net>,
        David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
        linux-wireless@...r.kernel.org, James Feeney <james@...ealm.net>
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working
 with wireless drivers

Hi everyone,

2017-08-10 14:43 GMT+02:00 Arend van Spriel <arend.vanspriel@...adcom.com>:
>
>
> On 10-08-17 07:39, Kalle Valo wrote:
>>
>> Hi Mahesh and Andy,
>>
>> James Feeney reported that there's a serious regression in bonding
>> module since v4.12, it doesn't work with wireless drivers anymore as
>> wireless drivers don't report the link speed via ethtool:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=196547
>>
>> In the bug report it's said that this commit is the culprit:
>>
>> 3f3c278c94dd bonding: fix active-backup transition
>
>
> This commit references another one. ie. commit c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state"). Before this commit the result of __ethtool_get_link_ksettings() was simply ignored.

Actually it was not simply ignored in any case: Further down in
bond_miimon_commit() there's a conditional call to
bond_3ad_handle_link_change() which triggers an update using
__get_link_speed() to actually access the result. A similar handler is
also called for lb modes.

>
> Commit 3f3c278c94dd ("bonding: fix active-backup transition") moves setting the link state to the call sites of bond_update_speed_duplex(), just not all call sites.
>
>> Is there a fix for this or should that commit be reverted? This seems to
>> be a serious regression as there are multiple reports already and we
>> should get it fixed for v4.13, and the fix backported to v4.12 stable
>> release.
>
>
> The ethtool callbacks really seem optional. At least in brcmfmac, the wireless driver I maintain, I only provide get_drvinfo callback and there is no warning triggered upon registering the netdev. The changes above now require each netdev to implement the get_link_ksettings callback (get_settings is deprecated) or the link is marked as DOWN ruling it out to be used as active bond slave. To the end-users who were using bonding this is simply a regression. So to fix that both changes should be reverted in my opinion.

Yes, also to me as user of a wireless slave in an active-backup bond
it's clearly a regression. But only partially for some modes like
active-backup since the bonding documentation [1] clearly lists as a
prerequisite

1) for 802.3ad: "Ethtool support in the base drivers for retrieving
the speed and duplex of each slave."
2) for tlb/alb: "Ethtool support in the base drivers for retrieving
the speed of each slave."

This was previously not directly enforced in the bonding code and thus
probably occasionally caused unexpected behavior. At least such
behavior is what to my understanding commit c4adfc822bf5 ("bonding:
make speed, duplex setting consistent with link state") and
3f3c278c94dd ("bonding: fix active-backup transition") intend to fix
with an apparent focus on 802.3ad. However those commits went too far
by requiring a get_link_ksettings implementation by every slave driver
REGARDLESS of the bond mode.

Earlier today I submitted the patch (bonding: require speed/duplex
only for 802.3ad, alb and tlb) [2] that only partially reverts what is
a regression following my aforementioned logic. This seems to me like
the best solution in the short term since it should satisfy both
usergroups represented by Mahesh and James and restores consistence
with the bonding documentation. James already commented approvingly on
that patch in the bug report. [3]

Regards
Andreas

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/bonding.txt
[2] https://www.spinics.net/lists/netdev/msg448662.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=196547#c10

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ