[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <E4CD12F19ABA0C4D8729E087A761DC3505DB592C@ORSMSX101.amr.corp.intel.com>
Date: Tue, 16 Dec 2014 22:46:48 +0000
From: "Arad, Ronen" <ronen.arad@...el.com>
To: B Viswanath <marichika4@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: John Fastabend <john.fastabend@...il.com>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Jiri Pirko <jiri@...nulli.us>,
"sfeldma@...il.com" <sfeldma@...il.com>,
"bcrl@...ck.org" <bcrl@...ck.org>, "tgraf@...g.ch" <tgraf@...g.ch>,
"stephen@...workplumber.org" <stephen@...workplumber.org>,
"linville@...driver.com" <linville@...driver.com>,
"vyasevic@...hat.com" <vyasevic@...hat.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"shm@...ulusnetworks.com" <shm@...ulusnetworks.com>,
"gospo@...ulusnetworks.com" <gospo@...ulusnetworks.com>
Subject: RE: [PATCH net-next v2 2/4] swdevice: add new api to set and del
bridge port attributes
> -----Original Message-----
> From: B Viswanath [mailto:marichika4@...il.com]
> Sent: Tuesday, December 16, 2014 11:52 PM
> To: Arad, Ronen
> Cc: netdev@...r.kernel.org; John Fastabend; Roopa Prabhu; Jamal Hadi
> Salim; Jiri Pirko; sfeldma@...il.com; bcrl@...ck.org; tgraf@...g.ch;
> stephen@...workplumber.org; linville@...driver.com;
> vyasevic@...hat.com; davem@...emloft.net;
> shm@...ulusnetworks.com; gospo@...ulusnetworks.com
> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and del
> bridge port attributes
>
> On 17 December 2014 at 02:22, Arad, Ronen <ronen.arad@...el.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: netdev-owner@...r.kernel.org [mailto:netdev-
> >> owner@...r.kernel.org] On Behalf Of B Viswanath
> >> Sent: Tuesday, December 16, 2014 9:24 PM
> >> To: Arad, Ronen
> >> Cc: John Fastabend; netdev@...r.kernel.org; Roopa Prabhu; Jamal Hadi
> >> Salim; Jiri Pirko; sfeldma@...il.com; bcrl@...ck.org; tgraf@...g.ch;
> >> stephen@...workplumber.org; linville@...driver.com;
> >> vyasevic@...hat.com; davem@...emloft.net;
> shm@...ulusnetworks.com;
> >> gospo@...ulusnetworks.com
> >> Subject: Re: [PATCH net-next v2 2/4] swdevice: add new api to set and
> >> del bridge port attributes
> >>
> >> Hi,
> >>
> >> This is my first email on this thread, and on this list. My apologies
> >> if I have not understood something correctly. I would like to
> >> participate in this discussion, which is one of the reasons I joined
> >> this list recently. Some feedback inline below.
> >>
> >> On 16 December 2014 at 22:59, Arad, Ronen <ronen.arad@...el.com>
> wrote:
> >> >
> >> >
> > <sniped for brevity>
> >> >>
> >> >> I'm still missing why there is duplicate implementations in the driver.
> >> >> If the driver implements the set of ndo ops why should it care who
> >> >> calls them? I think you tried to explain this already but I'm not seeing it.
> >> >>
> >> >
> >> > Let's consider a bridge property. I'll use the default PVID
> >> > attribute as an
> >> example. This is currently configurable by sysfs only and a netlink
> >> support for that is still due. Let's assume for our discussion that a
> >> DEAFAULT_PVID attribute will be added as a bridge attribute within
> >> AFSPEC nested attribute of AF_BRIDGE SETLINK message.
> >> > When a bridge device is present, this attribute is processed by the
> >> > bridge
> >> module and saved as default_pvid field in net_bridge structure. When
> >> a switch port is enslaved to a bridge, the bridge driver creates a
> >> net_bridge_port instance and assigns it a pvid inherited from the
> >> default_pvid attribute of the bridge. Setting the pvid for a new
> >> enslaved switch port is not done via netlink. It only applies to the
> >> net_bridge_port structure which is internal to the bridge module.
> >> Offloading this to HW is not addressed with current bridge offloading.
> >> >
> >> > When a bridge device is not used, the DEFAULT_PVID will be targeted
> >> > using
> >> the SELF flag to any of the switch ports. The driver will recognize
> >> that as a bridge port and will need to maintain some switch global
> >> structure similar to net_bridge where it could save the default_pvid.
> >> The driver, knowing that the switch port is not enslaved to a bridge,
> >> will have to replicate the same functionality. In the HW case, it
> >> will have to configure default VLAN on all the switch ports.
> >> > This is different from the yet to be defined way of propagating
> >> > default PVID
> >> from a bridge device to offloaded bridge ports.
> >> >
> >> > Another example is STP. STP attributes are bridge attributes which
> >> > are not
> >> offloaded when a bridge device is present. The bridge module handles
> >> STP protocol internally. Without bridge device, STP attributes have
> >> to be targeted at a switch port device and the driver should save
> >> them in driver-specific structures and have proprietary
> >> implementation of STP (as the one in the bridge module is not used).
> >>
> >> In general I feel that the switch-device and port relation should be
> >> that of the 'container-containee'. This is the actual physical
> >> relationship. Apart from some operations such as vlans and protocol
> >> related, it is tricky to model all operations directly on ports. My
> >> thinking is it is cleaner to have all operations be on switch-device,
> >> which in turn peculates the operations downward, to its contained
> >> ports as applicable. The offloading is really a property of the
> >> switch device and not individual ports. Similarly the FDB is
> >> maintained by the switch and not the ports. As we extend the current
> >> offloading mechanism to other L2, L3 and other features, we may find it
> easier to have a 'switch- device' in place.
> >>
> >> I am somewhat confused with the notion of bridges though. Many
> >> existing linux-based routers use bridges differently than as a vlan-
> broadcast-domain.
> >> For example it is common to have eth0.334 and
> >> eth1 in the same bridge. What is being done internally is that the
> >> additional vlan tag 334 (which indicates video traffic, say) is
> >> removed and that video traffic is being bridged to eth1. There is no
> >> default vlan for this bridge. This is a software bridge. I am not
> >> sure how this can be accomplished if there is a need to associate a vlan
> with a bridge.
> >>
> >> Thanks
> >> Viswanath
> >>
> >>
> >
> > Let's say we have three ports sp1, sp2, sp3 and two VLANs 10 and 20.
> > VLAN 10 is allowed on ports sp1 and sp2 and VLAN 20 is allowed on ports
> sp2 and sp3.
> > Let's say sp1 and sp3 are access ports and carry untagged traffic. Sp2 is an
> uplink trunk port and carries tagged traffic.
> >
> > This could be modeled using Linux bridge in at least two ways:
> >
> > 1) Bridge per VLAN
> > - Two bridge devices are used br10 and br20
> > - sp2.10 is a vlan interface on sp2 for VLAN 10
> > - sp2.20 is a vlan interface on sp2 for VLAN 20
> > - br10 enslaves ports sp1 and sp2.10
> > - br20 enslaves ports sp3 and sp2.20
> > - br10 and br20 could be L3 interfaces and have IP address assigned to.
> > This allows for routing between VLANs.
> > - Traffic sent to br10 will egress untagged on sp1 and tagged with
> VID=10 on sp2
> > - Traffic sent to br20 will egress untagged on sp2 and tagged with
> VID=20 on sp2
> > - Traffic received on sp1 is delivered to br10 and could be flooded if
> MAC DA is broadcast or unknown unicast.
> > - Traffic received on sp2 with VID=10 is delivered to br10 after the
> VLAN tag is removed.
> > - Similarly traffic receive on sp3 or tagged traffic with
> > VID=20 on sp2 is delivered to br20
>
> Thanks for the explanation. Understood this, and this is how I
> saw/implemented things so far. However, my understanding is that both
> br10 and br20 have nothing to do with vlan 10 and 20 respectively. The traffic
> the actual bridges see (say if we ran tcpdump on br10 or br20) is always
> untagged. The tagging happens while the packets are egressing the sp2 port.
> In that sense, there is no vlan associated with either of the bridges.
>
> What I was not sure was about how we can associate a vlan with such a
> bridge in this case, and what it really means for the traffic.
>
> >
> > 2) Single bridge with VLAN filtering
> > - A single bridge device br0 is used
> > - All ports sp1, sp2, and sp3 are enslaved to the bridge
> > - br0 allows both VID=10 and VID=20
> > - VLAN policy on the ports determines the bridging domains within the
> bridge
> > - sp1 allows VID=10, it is untagged on egress, and it is the PVID of
> sp1.
> > Linux does not provide a way to only allow untagged traffic to
> enter a port.
> > Both tagged traffic with VID=10 and untagged traffic are
> considered received on VLAN 10.
> > - sp3 allows VID=20, it is untagged on egress, and it is the PVID of
> sp2.
> > - sp2 allows VID=10 and VID=20, both are tagged on egress, and
> there is no PVID so untagged traffic received on sp2 is dropped.
> >
> > - Above configuration is sufficient for L2 switching with two distinct
> VLANs.
> > - L3 routing across VLANs in this model is achieved by vlan interfaces
> on the bridge. br0.10 for VLAN 10 and br0.20 for VLAN 20.
> > - br0.10 and br0.20 are L3 interfaces and have IP addresses assigned.
> > - br0.10 allows VID=10
>
> In this case am I correct in assuming that br0 actually represents the switch
> device ? I don't see a way in which one more such bridge can exist, since the
> forwarding decisions are handled at the switch-device level.
>
br0 represent the HW switch as far as configuration. Offloading (discussed on this patch-set) will propagate bridge and port attributes to underlying port switch devices for HW configuration.
Bypassing data-path handling by the software bridge is yet to be defined.
On the transmit path (i.e. packet originated by the switch OS or applications) it could be OK to use software bridging but it could be inefficient for multicast or flooding.
An alternative that hands packets sent to the bridge or to one of its vlan interfaces (br0.10, br0.20) to the HW switch is desirable.
On the receive path, delivering packets received from the HW by the switch driver on a switch port device (such as sp1, sp2, sp3) has to be avoided as it could cause duplicate packets (packets are flooded in the HW and again by the software bridge).
> Thanks
> Viswanath
>
> >
> >
> >> >
> >> >
> >> >> [...]
> >> >>
> >> >> I'll need to think about the l3 stuff but I think Jiri/Scott/Roopa
> >> >> might have worked some of it out.
> >> >>
> >> >> --
> >> >> John Fastabend Intel Corporation
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe netdev"
> >> > in the body of a message to majordomo@...r.kernel.org More
> >> > majordomo
> >> info
> >> > at http://vger.kernel.org/majordomo-info.html
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@...r.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists