netdev - Re: [PATCH 2/2] bonding: fix igmp_retrans type and two related races

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51B1A9DA.9020600@redhat.com>
Date:	Fri, 07 Jun 2013 11:37:30 +0200
From:	Nikolay Aleksandrov <nikolay@...hat.com>
To:	Jay Vosburgh <fubar@...ibm.com>
CC:	netdev@...r.kernel.org, andy@...yhouse.net, davem@...emloft.net
Subject: Re: [PATCH 2/2] bonding: fix igmp_retrans type and two related races

On 07/06/13 03:00, Jay Vosburgh wrote:
> nikolay@...hat.com wrote:
> 
>> From: Nikolay Aleksandrov <nikolay@...hat.com>
>>
>> First the type of igmp_retrans (which is the actual counter of
>> igmp_resend parameter) is changed to u8 to be able to store values up
>> to 255 (as per documentation). There are two races that were hidden
>> there and which are easy to trigger after the previous fix, the first is
>> between bond_resend_igmp_join_requests and bond_change_active_slave
>> where igmp_retrans is set and can be altered by the periodic. The second
>> race condition is between multiple running instances of the periodic
>> (upon execution it can be scheduled again for immediate execution which
>> can cause the counter to go < 0 which in the unsigned case leads to
>> unnecessary igmp retransmissions).
>> Since in bond_change_active_slave bond->lock is held for reading and
>> curr_slave_lock for writing, we use curr_slave_lock for mutual
>> exclusion. We can't drop them as there're cases where RTNL is not held
>> when bond_change_active_slave is called. RCU is unlocked in
>> bond_resend_igmp_join_requests before getting curr_slave_lock since we
>> don't need it there and it's pointless to delay.
> 
> 	My first thought is that it would be much simpler to change the
> limit in the documentation and code from 255 to 127 and be done with it.
> I'm skeptical that anybody uses values for igmp_retrans even as high as
> 10, much less 100 (which would take 20 seconds to complete at 5 per
> second).
> 
> 	That said, this is technically correct, although I have one
> question, below.
> 
Yes, I was thinking the same thing at first and even discussed it with
Andy. Although the race between bond_resend_igmp_join_requests and
bond_change_active_slave will still be valid.

>> Signed-off-by: Nikolay Aleksandrov <nikolay@...hat.com>
>> ---
>> drivers/net/bonding/bond_main.c | 15 +++++++++++----
>> drivers/net/bonding/bonding.h   |  2 +-
>> 2 files changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 473633a..02d9ae7 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -764,8 +764,8 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
>> 	struct net_device *bond_dev, *vlan_dev, *upper_dev;
>> 	struct vlan_entry *vlan;
>>
>> -	rcu_read_lock();
>> 	read_lock(&bond->lock);
>> +	rcu_read_lock();
>>
>> 	bond_dev = bond->dev;
>>
>> @@ -787,12 +787,19 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
>> 		if (vlan_dev)
>> 			__bond_resend_igmp_join_requests(vlan_dev);
>> 	}
>> +	rcu_read_unlock();
>>
>> -	if (--bond->igmp_retrans > 0)
>> +	/* We use curr_slave_lock to protect against concurrent access to
>> +	 * igmp_retrans from multiple running instances of this function and
>> +	 * bond_change_active_slave
>> +	 */
>> +	write_lock_bh(&bond->curr_slave_lock);
>> +	if (bond->igmp_retrans > 1) {
>> +		bond->igmp_retrans--;
>> 		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
> 
> 	Why split out the -- from the comparison?
> 
> 	-J
This one was very tricky, because we can have more than 2 instances
running concurrently and if we unconditionally decrement the value it
can still drop < 0. Example with 3 instances running and igmp_retrans ==
1 (with check bond->igmp_retrans-- > 1):
f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0
f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255
f3 does the unnecessary retransmissions.

I also have an interesting solution with cmpxchg without curr_slave_lock
but this is more straightforward and since this is not a fast path I
think it's preferrable.

Nik
>> -
>> +	}
>> +	write_unlock_bh(&bond->curr_slave_lock);
>> 	read_unlock(&bond->lock);
>> -	rcu_read_unlock();
>> }
>>
>> static void bond_resend_igmp_join_requests_delayed(struct work_struct *work)
>> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>> index 2baec24..f989e15 100644
>> --- a/drivers/net/bonding/bonding.h
>> +++ b/drivers/net/bonding/bonding.h
>> @@ -225,7 +225,7 @@ struct bonding {
>> 	rwlock_t curr_slave_lock;
>> 	u8	 send_peer_notif;
>> 	s8	 setup_by_slave;
>> -	s8       igmp_retrans;
>> +	u8       igmp_retrans;
>> #ifdef CONFIG_PROC_FS
>> 	struct   proc_dir_entry *proc_entry;
>> 	char     proc_file_name[IFNAMSIZ];
>> -- 
>> 1.8.1.4
>>
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html