lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201208113820.179ed5ca@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com>
Date:   Tue, 8 Dec 2020 11:38:20 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Jarod Wilson <jarod@...hat.com>
Cc:     linux-kernel@...r.kernel.org, Mahesh Bandewar <maheshb@...gle.com>,
        Jay Vosburgh <j.vosburgh@...il.com>,
        Veaceslav Falico <vfalico@...il.com>,
        Andy Gospodarek <andy@...yhouse.net>,
        "David S. Miller" <davem@...emloft.net>,
        Thomas Davis <tadavis@....gov>, netdev@...r.kernel.org
Subject: Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor
 thread

On Sat,  5 Dec 2020 18:43:54 -0500 Jarod Wilson wrote:
> I'm seeing a system get stuck unable to bring a downed interface back up
> when it's got an updelay value set, behavior which ceased when logging
> spew was removed from bond_miimon_inspect(). I'm monitoring logs on this
> system over another network connection, and it seems that the act of
> spewing logs at all there increases rtnl lock contention, because
> instrumented code showed bond_mii_monitor() never able to succeed in it's
> attempts to call rtnl_trylock() to actually commit link state changes,
> leaving the downed link stuck in BOND_LINK_DOWN. The system in question
> appears to be fine with the log spew being moved to
> bond_commit_link_state(), which is called after the successful
> rtnl_trylock().

But it's not called under rtnl_lock AFAICT. So something else is also
spewing messages?

While bond_commit_link_state() _is_ called under the lock. So you're
increasing the retry rate, by putting the slow operation under the
lock, is that right?

Also isn't bond_commit_link_state() called from many more places?
So we're adding new prints, effectively?

> I'm actually wondering if perhaps we ultimately need/want
> some bond-specific lock here to prevent racing with bond_close() instead
> of using rtnl, but this shift of the output appears to work. I believe
> this started happening when de77ecd4ef02 ("bonding: improve link-status
> update in mii-monitoring") went in, but I'm not 100% on that.
> 
> The addition of a case BOND_LINK_BACK in bond_miimon_inspect() is somewhat
> separate from the fix for the actual hang, but it eliminates a constant
> "invalid new link 3 on slave" message seen related to this issue, and it's
> not actually an invalid state here, so we shouldn't be reporting it as an
> error.

Let's make it a separate patch, then.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ