lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8845e49b-3165-e6df-5935-c86278d220d9@broadcom.com>
Date:   Thu, 10 Aug 2017 14:43:06 +0200
From:   Arend van Spriel <arend.vanspriel@...adcom.com>
To:     Kalle Valo <kvalo@...eaurora.org>,
        Mahesh Bandewar <maheshb@...gle.com>,
        Andy Gospodarek <andy@...yhouse.net>
Cc:     David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
        linux-wireless@...r.kernel.org, James Feeney <james@...ealm.net>
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working
 with wireless drivers



On 10-08-17 07:39, Kalle Valo wrote:
> Hi Mahesh and Andy,
> 
> James Feeney reported that there's a serious regression in bonding
> module since v4.12, it doesn't work with wireless drivers anymore as
> wireless drivers don't report the link speed via ethtool:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=196547
> 
> In the bug report it's said that this commit is the culprit:
> 
> 3f3c278c94dd bonding: fix active-backup transition

This commit references another one. ie. commit c4adfc822bf5 ("bonding: 
make speed, duplex setting consistent with link state"). Before this 
commit the result of __ethtool_get_link_ksettings() was simply ignored.

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -365,9 +365,10 @@ int bond_set_carrier(struct bonding *bond)
  /* Get link speed and duplex from the slave's base driver
   * using ethtool. If for some reason the call fails or the
   * values are invalid, set speed and duplex to -1,
- * and return.
+ * and return. Return 1 if speed or duplex settings are
+ * UNKNOWN; 0 otherwise.
   */
-static void bond_update_speed_duplex(struct slave *slave)
+static int bond_update_speed_duplex(struct slave *slave)
  {
         struct net_device *slave_dev = slave->dev;
         struct ethtool_link_ksettings ecmd;
@@ -377,24 +378,27 @@ static void bond_update_speed_duplex(struct slave 
*slave)
         slave->duplex = DUPLEX_UNKNOWN;

         res = __ethtool_get_link_ksettings(slave_dev, &ecmd);
-       if (res < 0)
-               return;
-
-       if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1))
-               return;
-
+       if (res < 0) {
+               slave->link = BOND_LINK_DOWN;
+               return 1;
+       }
+       if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) {
+               slave->link = BOND_LINK_DOWN;
+               return 1;
+       }

Commit 3f3c278c94dd ("bonding: fix active-backup transition") moves 
setting the link state to the call sites of bond_update_speed_duplex(), 
just not all call sites.

> Is there a fix for this or should that commit be reverted? This seems to
> be a serious regression as there are multiple reports already and we
> should get it fixed for v4.13, and the fix backported to v4.12 stable
> release.

The ethtool callbacks really seem optional. At least in brcmfmac, the 
wireless driver I maintain, I only provide get_drvinfo callback and 
there is no warning triggered upon registering the netdev. The changes 
above now require each netdev to implement the get_link_ksettings 
callback (get_settings is deprecated) or the link is marked as DOWN 
ruling it out to be used as active bond slave. To the end-users who were 
using bonding this is simply a regression. So to fix that both changes 
should be reverted in my opinion.

Now specifically for wireless interfaces we could implement 
get_link_ksettings callback although most of the fields requested are 
meaningless in wireless context. Regarding the speed and half-duplex 
values we raised some concerns in an earlier discussion with James. 
Wireless is always half-duplex as there can be only one (unintended ref 
to [1]). If the reported speed in wifi is difficult. In wifi we have 
txrate and rxrate which are inherently asynchronous and it is a 
per-packet value so it is going to change a lot. Seeing only 4 call 
sites in the bonding code tells me that is not taken into account. All 
in all this shenanigan seems netconf material to me.

Regards,
Arend

[1] https://en.wikipedia.org/wiki/Highlander_(film)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ