linux-kernel - Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfmpSfbvi4hVCCotFLQB3MzvCx9ZRX7ttVww136pPkm1bN6PQ@mail.gmail.com>
Date:   Wed, 9 Dec 2020 10:26:38 -0500
From:   Jarod Wilson <jarod@...hat.com>
To:     Jay Vosburgh <jay.vosburgh@...onical.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Mahesh Bandewar <maheshb@...gle.com>,
        Veaceslav Falico <vfalico@...il.com>,
        Andy Gospodarek <andy@...yhouse.net>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Thomas Davis <tadavis@....gov>, Netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread

On Tue, Dec 8, 2020 at 3:35 PM Jay Vosburgh <jay.vosburgh@...onical.com> wrote:
>
> Jarod Wilson <jarod@...hat.com> wrote:
>
> >I'm seeing a system get stuck unable to bring a downed interface back up
> >when it's got an updelay value set, behavior which ceased when logging
> >spew was removed from bond_miimon_inspect(). I'm monitoring logs on this
> >system over another network connection, and it seems that the act of
> >spewing logs at all there increases rtnl lock contention, because
> >instrumented code showed bond_mii_monitor() never able to succeed in it's
> >attempts to call rtnl_trylock() to actually commit link state changes,
> >leaving the downed link stuck in BOND_LINK_DOWN. The system in question
> >appears to be fine with the log spew being moved to
> >bond_commit_link_state(), which is called after the successful
> >rtnl_trylock(). I'm actually wondering if perhaps we ultimately need/want
> >some bond-specific lock here to prevent racing with bond_close() instead
> >of using rtnl, but this shift of the output appears to work. I believe
> >this started happening when de77ecd4ef02 ("bonding: improve link-status
> >update in mii-monitoring") went in, but I'm not 100% on that.
>
>         We use RTNL not to avoid deadlock with bonding itself, but
> because the "commit" side undertakes actions which require RTNL, e.g.,
> various events will eventually call netdev_lower_state_changed.
>
>         However, the RTNL acquisition is a trylock to avoid the deadlock
> with bond_close.  Moving that out of line here (e.g., putting the commit
> into another work queue event or the like) has the same problem, in that
> bond_close needs to wait for all of the work queue events to finish, and
> it holds RTNL.

Ah, okay, it wasn't clear to me that we actually do need RTNL here,
I'd thought it was just for the deadlock avoidance with bond_close,
based on the comments in the source.

>         Also, a dim memory says that the various notification messages
> were mostly placed in the "inspect" phase and not the "commit" phase to
> avoid doing printk-like activities with RTNL held.  As a general
> principle, I don't think we want to add more verbiage under RTNL.

Yeah, that's fair.

> >The addition of a case BOND_LINK_BACK in bond_miimon_inspect() is somewhat
> >separate from the fix for the actual hang, but it eliminates a constant
> >"invalid new link 3 on slave" message seen related to this issue, and it's
> >not actually an invalid state here, so we shouldn't be reporting it as an
> >error.
>
>         Do you mean bond_miimon_commit here and not bond_miimon_inspect
> (which already has a case for BOND_LINK_BACK)?

Whoops, yes.

>         In principle, bond_miimon_commit should not see _BACK or _FAIL
> state as a new link state, because those states should be managed at the
> bond_miimon_inspect level (as they are the result of updelay and
> downdelay).  These states should not be "committed" in the sense of
> causing notifications or doing actions that require RTNL.
>
>         My recollection is that the "invalid new link" messages were the
> result of a bug in de77ecd4ef02, which was fixed in 1899bb325149
> ("bonding: fix state transition issue in link monitoring"), but maybe
> the RTNL problem here induces that in some other fashion.
>
>         Either way, I believe this message is correct as-is.

Hm, okay, definitely seeing this message pop up regularly when a link
recovers, using a fairly simple reproducer:

slave1=p6p1
slave2=p6p2

modprobe -rv bonding
modprobe -v bonding mode=2 miimon=100 updelay=200
ip link set bond0 up
ifenslave bond0 $slave1 $slave2
sleep 10

while :
do
  ip link set $slave1 down
   sleep 1
   ip link set $slave1 up
   sleep 1
done

I wasn't actually seeing the problem until I was running a 'watch -n 1
"dmesg | tail -n 50"' or similar in a remote ssh session on the host.

I should add the caveat that this was also initially seen on an older
kernel, but with a fairly up-to-date bonding driver, which does
include both de77ecd4ef02 and 1899bb325149. I'm going to keep prodding
w/a more recent upstreamier kernel, and see if I can get a better idea
of what's actually going on.

-- 
Jarod Wilson
jarod@...hat.com