lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 03 Mar 2011 12:07:36 -0800
From:	Jay Vosburgh <fubar@...ibm.com>
To:	Nils Carlson <nils.carlson@...csson.com>
cc:	bonding-devel@...ts.sourceforge.net, netdev@...r.kernel.org
Subject: Re: [PATCH] bonding 802.3ad: Fix the state machine locking

Nils Carlson <nils.carlson@...csson.com> wrote:

>The current implementation only locked around the rx state machine,
>but the rx state machine reads data touched by other state machines
>as well, so in a very ugly scenario we would see the rx state machine
>reading bad values as data was being overwritten by other state
>machines. This patch moves the bond_3ad locking to protect all the
>state machines from concurrency issues.

	This looks pretty obvious on inspection; I'm surprised it hasn't
come up before.  I haven't tested it, but it seems clear that at least
the port->sm_vars field needs protection.

	I have one suggestion for a comment change, which is below, to
make it cleared what data needs mutexing.  I'd also add the following
into the description somewhere:

	The ad_rx_machine, ad_periodic_machine and
ad_port_selection_logic functions all inspect and alter common fields
within the port structure.  Previous to this patch, only the
ad_rx_machines were mutexed, and the periodic and port_selection could
run unmutexed against an rx_machine trigged by an arriving LACPDU.

	I'm a little torn on the change of nomenclature of "rx" to
"state" lock, mostly because it makes the functional change harder to
follow in the patch.  Could you split this into two patches, one to fix
the locking problem, and one to change the name?

>Signed-off-by: Nils Carlson <nils.carlson@...csson.com>
>---
> drivers/net/bonding/bond_3ad.c |   32 +++++++++++++++++++-------------
> drivers/net/bonding/bond_3ad.h |    2 +-
> 2 files changed, 20 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 1024ae1..bbc473b 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -281,23 +281,23 @@ static inline int __check_agg_selection_timer(struct port *port)
> }
>
> /**
>- * __get_rx_machine_lock - lock the port's RX machine
>+ * __get_state_machine_lock - lock the port's state machine
>  * @port: the port we're looking at
>  *
>  */
>-static inline void __get_rx_machine_lock(struct port *port)
>+static inline void __get_state_machine_lock(struct port *port)
> {
>-	spin_lock_bh(&(SLAVE_AD_INFO(port->slave).rx_machine_lock));
>+	spin_lock_bh(&(SLAVE_AD_INFO(port->slave).state_machine_lock));
> }
>
> /**
>- * __release_rx_machine_lock - unlock the port's RX machine
>+ * __release_state_machine_lock - unlock the port's state machine
>  * @port: the port we're looking at
>  *
>  */
>-static inline void __release_rx_machine_lock(struct port *port)
>+static inline void __release_state_machine_lock(struct port *port)
> {
>-	spin_unlock_bh(&(SLAVE_AD_INFO(port->slave).rx_machine_lock));
>+	spin_unlock_bh(&(SLAVE_AD_INFO(port->slave).state_machine_lock));
> }
>
> /**
>@@ -388,14 +388,14 @@ static u8 __get_duplex(struct port *port)
> }
>
> /**
>- * __initialize_port_locks - initialize a port's RX machine spinlock
>+ * __initialize_port_locks - initialize a port's state machine spinlock
>  * @port: the port we're looking at
>  *
>  */
> static inline void __initialize_port_locks(struct port *port)
> {
> 	// make sure it isn't called twice
>-	spin_lock_init(&(SLAVE_AD_INFO(port->slave).rx_machine_lock));
>+	spin_lock_init(&(SLAVE_AD_INFO(port->slave).state_machine_lock));
> }
>
> //conversions
>@@ -1025,9 +1025,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> {
> 	rx_states_t last_state;
>
>-	// Lock to prevent 2 instances of this function to run simultaneously(rx interrupt and periodic machine callback)
>-	__get_rx_machine_lock(port);
>-
> 	// keep current State Machine state to compare later if it was changed
> 	last_state = port->sm_rx_state;
>
>@@ -1133,7 +1130,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 				pr_err("%s: An illegal loopback occurred on adapter (%s).\n"
> 				       "Check the configuration to verify that all adapters are connected to 802.3ad compliant switch ports\n",
> 				       port->slave->dev->master->name, port->slave->dev->name);
>-				__release_rx_machine_lock(port);
> 				return;
> 			}
> 			__update_selected(lacpdu, port);
>@@ -1153,7 +1149,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 			break;
> 		}
> 	}
>-	__release_rx_machine_lock(port);
> }
>
> /**
>@@ -2155,6 +2150,12 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
> 			goto re_arm;
> 		}
>
>+		/* Lock around all the state machines to protect data that may
>+		 * be accessed also from the ad_rx_machine running in the
>+		 * interrupt handler.

	The reference to interrupt handler confused me initially; I think
this comment would be better stated as:

		/* Lock around state machines to protect data accessed
		 * by all (e.g., port->sm_vars).  ad_rx_machine may run
		 * concurrently due to incoming LACPDU.
		 */



>+		 */
>+		__get_state_machine_lock(port);
>+
> 		ad_rx_machine(NULL, port);
> 		ad_periodic_machine(port);
> 		ad_port_selection_logic(port);
>@@ -2164,6 +2165,8 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
> 		// turn off the BEGIN bit, since we already handled it
> 		if (port->sm_vars & AD_PORT_BEGIN)
> 			port->sm_vars &= ~AD_PORT_BEGIN;
>+
>+		__release_state_machine_lock(port);
> 	}
>
> re_arm:
>@@ -2200,7 +2203,10 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
> 		case AD_TYPE_LACPDU:
> 			pr_debug("Received LACPDU on port %d\n",
> 				 port->actor_port_number);
>+			/* Protect against concurrent state machines */
>+			__get_state_machine_lock(port);
> 			ad_rx_machine(lacpdu, port);
>+			__release_state_machine_lock(port);
> 			break;
>
> 		case AD_TYPE_MARKER:
>diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
>index 2c46a15..9182506 100644
>--- a/drivers/net/bonding/bond_3ad.h
>+++ b/drivers/net/bonding/bond_3ad.h
>@@ -264,7 +264,7 @@ struct ad_bond_info {
> struct ad_slave_info {
> 	struct aggregator aggregator;	    // 802.3ad aggregator structure
> 	struct port port;		    // 802.3ad port structure
>-	spinlock_t rx_machine_lock; // To avoid race condition between callback and receive interrupt
>+	spinlock_t state_machine_lock; // To avoid race condition between callback and receive interrupt

	Perhaps change this comment, also, to "mutex state machine
vs. LACPDU".

	-J

> 	u16 id;
> };
>
>-- 
>1.7.1
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@...ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ