netdev - Re: [RFC PATCH v2 net-next 4/7] bridge: Automatically manage port promiscuous mode.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140430153818.GB23944@redhat.com>
Date:	Wed, 30 Apr 2014 18:38:18 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Vlad Yasevich <vyasevich@...il.com>
Cc:	Vlad Yasevich <vyasevic@...hat.com>, netdev@...r.kernel.org,
	bridge@...ts.linux-foundation.org, shemminger@...tta.com,
	jhs@...atatu.com, john.r.fastabend@...el.com
Subject: Re: [RFC PATCH v2 net-next 4/7] bridge: Automatically manage port
 promiscuous mode.

On Wed, Apr 30, 2014 at 10:17:21AM -0400, Vlad Yasevich wrote:
> On 04/30/2014 07:46 AM, Michael S. Tsirkin wrote:
> > On Tue, Apr 29, 2014 at 05:20:25PM -0400, Vlad Yasevich wrote:
> >> When there is only 1 port that can learn mac addresses and flood
> >> unknown traffic, this port can function in non-promiscouse mode.
> > 
> > typo: non-promiscous
> > 
> > I think the code is correct here. But I'm not sure the explanation
> > addresses the question that has been raised in the past:
> > why is it correct to tie the output (flood) and input (learning)
> > together and have it affect input (promisc).
> > 
> 
> I found that thinking of the code in terms of possible configuration
> examples helped clarify the behavior in my head.

I guess different people just think differently.
Maybe include both the general rule, then add:
examples: and list them.

> > I would offer the following explanation to address the above
> > question:
> > 
> > 
> > Consider a specific port X. All traffic on its input will be
> > output through one of the other ports (or dropped).
> > Thus if *no* other port need to flood traffic
> > on its output, then the combination of FDB entries for other
> > ports completely specifies the set of legal mac addresses on the input
> > of port X.
> 
> Right.  I tried to lead to this with the first two examples where one is
> an extension of another.  It is sometimes easier to think about the
> behavior when you have specifics instead of abstract concepts.
> May be I didn't do this very well.
> 
> > 
> > If, additionally, learning is disabled for all other ports,
> > these FDB entries for other ports are static, so this set
> > of mac addresses on port X does not change much.
> > 
> > This patch detects that no other port needs to flood or has learning
> > enabled, and disables promisc mode for port X, instead programming the
> > set of output mac addresses for other ports from the static fdb into the
> > hardware filtering table in the underlying device.
> > 
> > Note: an alternative would be to simply check that no other port needs
> > to flood, however, this would mean that non-static FDB updates by
> > learning on other ports need to be propagated to hardware on port X,
> > which is harder to implement efficiently.
> 
> Not only this, but without some sort of mac validation, this is a source
> of attack against the bridge as it a single malicious node can now
> saturate the hw filter and cause the bridge to consume extra memory.
> > 
> > It might be a good idea to put something like this above in
> > commit log, along with the examples you provide below.
> > A shortened version might be helpful in code comments as well.
> > 
> >> The simplest example of this configuration is a bridge with only
> >> 1 port.  In this case, the bridge can't forward traffic anywhere.
> >> All it can do is recevie traffic destined to it.  It doesn't need
> >> to the interface into promiscouse mode.
> >> A more complicated example is a bridge with 2 or more ports where
> >> only 1 port remains capable of auto-discover and all others disable
> >> mac learning and flooding thus requiring static configuration.  You
> >> could think of this configuraiton as mimicking and edge relay with
> >> 1 uplink port (the one can still learn things), and N downlink ports
> >> that have to be manually programmed by user or managment entity.
> >> In this case, the 1 uplink doesn't really have
> >> to be promiscous either since the bridge would have all the necessary
> >> data in the fdb through static configuration.  This one port can be
> >> manually programmed to allow to recieve necessary traffic.
> >> Another case where there is no need for promiscuous mode is when
> >> there are no "proverbial" uplinks and all ports are manually managed.
> >> All necessary information will be provided, so anything an interface
> >> receives in promiscouse mode is extra and could even be considered
> >> security threat.
> > 
> > Might be a good idea to clarify: the security problem is that with
> > hardware filtering by MAC being off, it's easier to cause load spikes on
> > multiple bridges on the LAN.
> > 
> > 
> >> This patch supports the above scenarios.
> >>
> >> Signed-off-by: Vlad Yasevich <vyasevic@...hat.com>
> > 
> > 
> > I think what this patchset is trying to do correct.
> > Minor comments below.
> > 
> > 
> > Would bisect hit partially broken configurations if
> > it picks patch 4 but not 5-7?
> > If yes it's best to smash patches 5-7 in with this one.
> 
> I think I can restructure patch 5 so that it can be before this one
> and stand on its own.
> Patches 6 and 7 could potentially break some configurations during
> bisect.  They are small enough that they can be squashed, but anything
> that makes this patch more complicated to review is not so good IMO.
> I'll give it a go and see what it looks like.
> 
> > 
> > 
> >> ---
> >>  net/bridge/br_if.c      | 85 +++++++++++++++++++++++++++++++++++++++++++++----
> >>  net/bridge/br_private.h |  2 ++
> >>  2 files changed, 81 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> >> index 7e60c4c..5a26ca2 100644
> >> --- a/net/bridge/br_if.c
> >> +++ b/net/bridge/br_if.c
> >> @@ -85,6 +85,64 @@ void br_port_carrier_check(struct net_bridge_port *p)
> >>  	spin_unlock_bh(&br->lock);
> >>  }
> >>  
> >> +static void br_port_set_promisc(struct net_bridge_port *p)
> >> +{
> >> +	int err = 0;
> >> +
> >> +	if (br_is_promisc_port(p))
> >> +		return;
> >> +
> >> +	err = dev_set_promiscuity(p->dev, 1);
> >> +	if (err)
> >> +		return;
> > 
> > Can this happen here? We can't recover so maybe warn if it does?
> > 
> 
> It can happen if the device promiscuity count overflows.  If that
> happens a warning is already printed.
> The good thing is that the device is already in promisc mode so
> everything still works.

I see. Yes I think that is enough.

> >> +
> >> +	br_fdb_unsync_static(p->br, p);
> >> +	p->flags |= BR_PROMISC;
> >> +}
> >> +
> >> +static void br_port_clear_promisc(struct net_bridge_port *p)
> >> +{
> >> +	int err;
> >> +
> >> +	if (!br_is_promisc_port(p))
> >> +		return;
> >> +
> >> +	/* Since we'll be clearing the promisc mode, program the port
> >> +	 * first so that we don't have interruption in traffic.
> >> +	 */
> >> +	err = br_fdb_sync_static(p->br, p);
> >> +	if (err)
> >> +		return;
> >> +
> >> +	dev_set_promiscuity(p->dev, -1);
> >> +	p->flags &= ~BR_PROMISC;
> >> +}
> >> +
> >> +/* When a port is added or removed or when certain port flags
> >> + * change, this function is called to automatically mange
> >> + * promiscuity setting of all the bridge ports.  We are always called
> >> + * under RTNL so can skip using rcu primitives.
> >> + */
> >> +static void br_manage_promisc(struct net_bridge *br)
> >> +{
> >> +	struct net_bridge_port *p;
> >> +
> >> +	/* Algorithm is simple.
> > 
> > I think it's best to drop this line.
> > It doesn't really clarify anything :)
> > 
> >>  If all the port
> > 
> > all the ports?
> > 
> >> require static
> >> +	 * configuration, we know everything and can simply write
> >> +	 * that down to the ports and clear promisc.
> >> +	 * If only 1 port is automatic and all the others require
> >> +	 * static configuration, we can write all the static data
> >> +	 * to this one automatic port and still make non-promisc.
> > 
> > 
> > make *this port* non promisc.
> > 
> >> +	 */
> > 
> > It's best to move this comment within list_for_each_entry,
> > near the code it documents.
> > 
> 
> Will fix
> 
> >> +	list_for_each_entry(p, &br->port_list, list) {
> >> +		if (br->auto_cnt == 0 ||
> >> +		    (br->auto_cnt == 1 && br_is_auto_port(p)))
> > 
> > 
> > This is correct but I think it would be clearer as follows:
> > 
> > 	list_for_each_entry(p, &br->port_list, list) {
> > 		/* If this is the only automatic port, others are not automatic so
> > 		 * their output configuration is determined by static fdb entries.
> > 		 * Thus, since input on this port will become output on one of the
> > 		 * others, this means our input configuration is static: write it out to
> > 		 * hardware and disable promiscous mode.
> > 		 * Otherwise, make the port promiscous.
> > 		 */
> > 		if (br->auto_cnt <= br_is_auto_port(p))
> > 				br_port_clear_promisc(p);
> > 			else
> > 				br_port_set_promisc(p);
> > 	}
> > 
> > 
> > instead of enumerating cases in comments and in code like you did above.
> > 
> 
> br_is_auto_port give a boolean.
>  I suppose I can make it return an
> integer, but that IMO makes it more confusing.

In C boolean is 0 or 1 so no need to change it to integer.

> I can questions about
> why do we clear it when there are no automatic ports.

Sorry don't understand this sentence.

You comment doesn't explicitly explain the case of auto_cnt == 0
either, but if you want to enumarate both examples maybe combine the
two comments and list all cases, e.g.

	Cases:
		auto_cnt == 0
			clear promisc
		auto_cnt == 1 br_is_auto_port == 1
			clear promisc
		auto_cnt == 1 br_is_auto_port == 0
			set promisc


> I was trying to
> be very explicit in the conditions.
> 
> Thanks
> -vlad


I think a general rule in code is always better than a
bunch of cases listed.
listing cases might be good for documentation / comments
but you need to check them against code anyway, so the
simpler the code the better.

> > 	
> > 
> > 
> >> +			br_port_clear_promisc(p);
> >> +		else
> >> +			br_port_set_promisc(p);
> >> +	}
> >> +}
> >> +
> >>  static void nbp_update_port_count(struct net_bridge *br)
> >>  {
> >>  	struct net_bridge_port *p;
> >> @@ -94,7 +152,23 @@ static void nbp_update_port_count(struct net_bridge *br)
> >>  		if (br_is_auto_port(p))
> >>  			cnt++;
> >>  	}
> >> -	br->auto_cnt = cnt;
> >> +	if (br->auto_cnt != cnt) {
> >> +		br->auto_cnt = cnt;
> >> +		br_manage_promisc(br);
> >> +	}
> >> +}
> >> +
> >> +static void nbp_delete_promisc(struct net_bridge_port *p)
> >> +{
> >> +	/* If port is currently promiscous, unset promiscuity.
> >> +	 * Otherwise, it is a static port so remove all addresses
> >> +	 * from it.
> >> +	 */
> >> +	dev_set_allmulti(p->dev, -1);
> >> +	if (br_is_promisc_port(p))
> >> +		dev_set_promiscuity(p->dev, -1);
> >> +	else
> >> +		br_fdb_unsync_static(p->br, p);
> >>  }
> >>  
> >>  static void release_nbp(struct kobject *kobj)
> >> @@ -145,7 +219,7 @@ static void del_nbp(struct net_bridge_port *p)
> >>  
> >>  	sysfs_remove_link(br->ifobj, p->dev->name);
> >>  
> >> -	dev_set_promiscuity(dev, -1);
> >> +	nbp_delete_promisc(p);
> >>  
> >>  	spin_lock_bh(&br->lock);
> >>  	br_stp_disable_port(p);
> >> @@ -153,11 +227,10 @@ static void del_nbp(struct net_bridge_port *p)
> >>  
> >>  	br_ifinfo_notify(RTM_DELLINK, p);
> >>  
> >> -	nbp_vlan_flush(p);
> >> -	br_fdb_delete_by_port(br, p, 1);
> >> -
> >>  	list_del_rcu(&p->list);
> >>  
> >> +	nbp_vlan_flush(p);
> >> +	br_fdb_delete_by_port(br, p, 1);
> >>  	nbp_update_port_count(br);
> >>  
> >>  	dev->priv_flags &= ~IFF_BRIDGE_PORT;
> >> @@ -367,7 +440,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
> >>  
> >>  	call_netdevice_notifiers(NETDEV_JOIN, dev);
> >>  
> >> -	err = dev_set_promiscuity(dev, 1);
> >> +	err = dev_set_allmulti(dev, 1);
> >>  	if (err)
> >>  		goto put_back;
> >>  
> >> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> >> index 2023671..1c794fef 100644
> >> --- a/net/bridge/br_private.h
> >> +++ b/net/bridge/br_private.h
> >> @@ -175,6 +175,7 @@ struct net_bridge_port
> >>  #define BR_LEARNING		0x00000020
> >>  #define BR_FLOOD		0x00000040
> >>  #define BR_AUTO_MASK (BR_FLOOD | BR_LEARNING)
> >> +#define BR_PROMISC		0x00000080
> >>  
> >>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> >>  	struct bridge_mcast_query	ip4_query;
> >> @@ -200,6 +201,7 @@ struct net_bridge_port
> >>  };
> >>  
> >>  #define br_is_auto_port(p) (((p)->flags & BR_AUTO_MASK) == BR_AUTO_MASK)
> >> +#define br_is_promisc_port(p) ((p)->flags & BR_PROMISC)
> >>  
> >>  #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
> >>  
> >> -- 
> >> 1.9.0
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html