lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 May 2017 12:32:46 -0700
From:   Nithin Sujir <nsujir@...tri.com>
To:     netdev@...r.kernel.org
Subject: bond link state mismatch, rtnl_trylock() vs rtnl_lock()

Hi,
We're encountering a problem in 4.4 LTS where, rarely, the bond link 
state is not updated when the slave link changes.

I've traced the issue to the arp monitor unable to get the rtnl lock. 
The sequence resulting in failure is as below.

bond_loadbalance_arp_mon() periodically called, if slave link is _down_, 
it checks if the slave is sending/receiving packets. If it is, it sets 
flags to be processed later down the function for bond link update. 
However, it sets the slave->link right away.

                 if (slave->link != BOND_LINK_UP) {
                         if (bond_time_in_interval(bond, trans_start, 1) &&
                             bond_time_in_interval(bond, slave->last_rx, 
1)) {

                                 slave->link  = BOND_LINK_UP;
                                 slave_state_changed = 1;


Later down the function, it tries to get the rtnl_lock. If it doesn't 
get it, it rearms and returns.

         if (do_failover || slave_state_changed) {
                 if (!rtnl_trylock())
                         goto re_arm; <-- returns here

                 if (slave_state_changed) {
                         bond_slave_state_change(bond);

This is the problem. The next time this function is called, the 
slave->link is already marked UP. And we will never update the bond link 
state to UP.

Changing the rtnl_trylock() -> rtnl_lock() _does_ fix the issue.

Is this the right way to fix it? If it is, I can submit this formally.

What are the guidelines around using rtnl_lock() vs rtnl_trylock()? Some 
places are using rtnl_lock() and other rtnl_trylock(). Sorry, I couldn't 
find much via a google search or in Documentation/.

Thanks,
Nithin.

--------------------

diff --git a/drivers/net/bonding/bond_main.c 
b/drivers/net/bonding/bond_main.c
index 5dca77e..1f60503 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct 
work_struct *work)
         rcu_read_unlock();

         if (do_failover || slave_state_changed) {
-               if (!rtnl_trylock())
-                       goto re_arm;
+               rtnl_lock();

                 if (slave_state_changed) {
                         bond_slave_state_change(bond);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ