lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Jan 2017 12:43:19 -0800
From:   Roopa Prabhu <roopa@...ulusnetworks.com>
To:     Stephen Hemminger <stephen@...workplumber.org>
CC:     netdev@...r.kernel.org, davem@...emloft.net,
        nikolay@...ulusnetworks.com, tgraf@...g.ch,
        hannes@...essinduktion.org, jbenc@...hat.com, pshelar@....org,
        dsa@...ulusnetworks.com, hadi@...atatu.com
Subject: Re: [PATCH net-next 0/5] bridge: per vlan dst_metadata support

On 1/31/17, 8:41 AM, Stephen Hemminger wrote:
> On Mon, 30 Jan 2017 21:57:10 -0800
> Roopa Prabhu <roopa@...ulusnetworks.com> wrote:
>
>> From: Roopa Prabhu <roopa@...ulusnetworks.com>
>>
>> High level summary:
>> lwt and dst_metadata have enabled vxlan l3 deployments
>> to use a single vxlan netdev for multiple vnis eliminating the scalability
>> problem with using a single vxlan netdev per vni. This series tries to
>> do the same for vxlan netdevs in pure l2 bridged networks.
>> Use-case/deployment and details are below.
>>
>> Deployment scerario details:
>> As we know VXLAN is used to build layer 2 virtual networks across the
>> underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
>> originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
>> or a vswitch in the hypervisor. This patch series mainly
>> focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
>> along with vlan id is used to identify layer 2 segments in a vxlan
>> overlay network. Vxlan bridging is the function provided by Vteps to terminate
>> vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
>> covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 7348.
>> To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
>> says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
>> the original Layer 2 packet if there is one before encapsulating the packet
>> into the VXLAN format to transmit it through the underlay network. The remote
>> VTEP devices have information about the VLAN in which the packet will be
>> placed based on their own VLAN-to-VXLAN VNI mapping configurations.
>>
>> Existing solution:
>> Without this patch series one can deploy such a vtep configuration by
>> adding the local ports and vxlan netdevs into a vlan filtering bridge.
>> The local ports are configured as trunk ports carrying all vlans.
>> A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
>> achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
>> The vxlan netdev only receives traffic corresponding to the vlan it is mapped
>> to. This configuration maps traffic belonging to a vlan to the corresponding
>> vxlan segment.
>>
>>           -----------------------------------
>>          |              bridge               |
>>          |                                   |
>>           -----------------------------------
>>             |100,200       |100 (pvid)    |200 (pvid)
>>             |              |              |
>>            swp1          vxlan1000      vxlan2000
>>                     
>> This provides the required vxlan bridging function but poses a
>> scalability problem with using a separate vxlan netdev for each vni.
>>
>> Solution in this patch series:
>> The Goal is to use a single vxlan device to carry all vnis similar
>> to the vxlan collect metadata mode but additionally allowing the bridge
>> and vxlan driver to carry all the forwarding information and also learn.
>> This implementation uses the existing dst_metadata infrastructure to map
>> vlan to a tunnel id.
>> - vxlan driver changes:
>>     - enable collect metadata mode to be used with learning,
>>       replication and fdb
>>     - A single fdb table hashed by (mac, vni)
>>     - rx path already has the vni
>>     - tx path expects a vni in the packet with dst_metadata and relies
>>       on learnt or static forwarding information table to forward the packet
>>
>> - Bridge driver changes: per vlan dst_metadata support:
>>     - Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
>>       kept the api generic for any tunnel info
>>     - Uapi to configure/unconfigure/dump per vlan tunnel data
>>     - new bridge port flag to turn this feature on/off. off by default
>>     - ingress hook:
>>         - if port is a tunnel port, use tunnel info in
>>           attached dst_metadata to map it to a local vlan
>>     - egress hook:
>>         - if port is a tunnel port, use tunnel info attached to vlan
>>           to set dst_metadata on the skb
>>
>> Other approaches tried and vetoed:
>> - tc vlan push/pop and tunnel metadata dst:
>>     - though tc can be used to do part of this, these patches address a deployment
>>       case where bridge driver vlan filtering and forwarding information
>>       database along with vxlan driver forwarding information table and learning
>>       are required.
>> - making vxlan driver understand vlan-vni mapping:
>>     - I had a series almost ready with this one but soon realized
>>       it duplicated a lot of vlan handling code in the vxlan driver
>>
>> Roopa Prabhu (5):
>>   ip_tunnels: new IP_TUNNEL_INFO_BRIDGE flag for ip_tunnel_info mode
>>   vxlan: support fdb and learning in COLLECT_METADATA mode
>>   bridge: uapi: add per vlan tunnel info
>>   bridge: per vlan dst_metadata netlink support
>>   bridge: vlan dst_metadata hooks in ingress and egress paths
>>
>>  drivers/net/vxlan.c            |  211 +++++++++++++++++-----------
>>  include/linux/if_bridge.h      |    1 +
>>  include/net/ip_tunnels.h       |    1 +
>>  include/uapi/linux/if_bridge.h |   11 ++
>>  include/uapi/linux/if_link.h   |    1 +
>>  include/uapi/linux/neighbour.h |    1 +
>>  net/bridge/Makefile            |    5 +-
>>  net/bridge/br_forward.c        |    2 +-
>>  net/bridge/br_input.c          |    8 +-
>>  net/bridge/br_netlink.c        |  140 +++++++++++++------
>>  net/bridge/br_netlink_tunnel.c |  296 ++++++++++++++++++++++++++++++++++++++++
>>  net/bridge/br_private.h        |   12 ++
>>  net/bridge/br_private_tunnel.h |   47 +++++++
>>  net/bridge/br_vlan.c           |   24 +++-
>>  net/bridge/br_vlan_tunnel.c    |  203 +++++++++++++++++++++++++++
>>  15 files changed, 837 insertions(+), 126 deletions(-)
>>  create mode 100644 net/bridge/br_netlink_tunnel.c
>>  create mode 100644 net/bridge/br_private_tunnel.h
>>  create mode 100644 net/bridge/br_vlan_tunnel.c
>>
> I still think such complexity should be done with OVS where the architecture
> is much more flexible. Rather than adding lots more special case hacks into
> bridge.

But, this is just discouraging people from using the bridge driver. sorry, but i think it is a bit too late for that now :)
A few things:
- Like I have said before, bridge driver vlan filtering and forwarding database has been
ideal to offload to switch asics. We have many industry standard bridging
networking features deployed using the bridge driver...even the vxlan bridging gateway
I mention in the deployment section above (this patch series just helps with scaling those deployments).
When bridge driver has all it takes to be deployed on a data center switch today, I am not understanding
the argument on saving it from newer features. why not enable bridge for newer features when people are using it ?

- vlan to tunnel-id (or vlan to vxlan id) mapping is not a hack. It is supported on every data center switch
that supports l2 gateway functions today (google will give a few hits).

- dst_metadata propagation is also not a hack. It is a generic infrastructure provided by the kernel
that any subsystem can use...and is already in use in various parts in the kernel today.

- We heavily use bridge driver forwarding database for our l2 deployments similar to the routing fib.
With routing protocols like bgp being used as control plane for l2 overlays
 https://tools.ietf.org/html/draft-ietf-bess-evpn-overlay-07, bgp implementations like quagga will also
now start looking at the bridge forwarding database.

- this patchset enables a feature which is off by default, so i am not sure how it is adding additional
complexity to the bridge driver.

Thanks,
Roopa



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ