lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 24 May 2020 19:24:27 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Ido Schimmel <idosch@...sch.org>
Cc:     Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jiri Pirko <jiri@...nulli.us>,
        Jakub Kicinski <kuba@...nel.org>,
        Ivan Vecera <ivecera@...hat.com>,
        netdev <netdev@...r.kernel.org>,
        Horatiu Vultur <horatiu.vultur@...rochip.com>,
        "Allan W. Nielsen" <allan.nielsen@...rochip.com>,
        Nikolay Aleksandrov <nikolay@...ulusnetworks.com>,
        Roopa Prabhu <roopa@...ulusnetworks.com>
Subject: Re: [PATCH RFC net-next 00/13] RX filtering for DSA switches

On Sun, 24 May 2020 at 17:07, Ido Schimmel <idosch@...sch.org> wrote:
>
> On Fri, May 22, 2020 at 12:10:23AM +0300, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@....com>
> >
> > This is a WIP series whose stated goal is to allow DSA and switchdev
> > drivers to flood less traffic to the CPU while keeping the same level of
> > functionality.
> >
> > The strategy is to whitelist towards the CPU only the {DMAC, VLAN} pairs
> > that the operating system has expressed its interest in, either due to
> > those being the MAC addresses of one of the switch ports, or addresses
> > added to our device's RX filter via calls to dev_uc_add/dev_mc_add.
> > Then, the traffic which is not explicitly whitelisted is not sent by the
> > hardware to the CPU, under the assumption that the CPU didn't ask for it
> > and would have dropped it anyway.
> >
> > The ground for these patches were the discussions surrounding RX
> > filtering with switchdev in general, as well as with DSA in particular:
> >
> > "[PATCH net-next 0/4] DSA: promisc on master, generic flow dissector code":
> > https://www.spinics.net/lists/netdev/msg651922.html
> > "[PATCH v3 net-next 2/2] net: dsa: felix: Allow unknown unicast traffic towards the CPU port module":
> > https://www.spinics.net/lists/netdev/msg634859.html
> > "[PATCH v3 0/2] net: core: Notify on changes to dev->promiscuity":
> > https://lkml.org/lkml/2019/8/29/255
> > LPC2019 - SwitchDev offload optimizations:
> > https://www.youtube.com/watch?v=B1HhxEcU7Jg
> >
> > Unicast filtering comes to me as most important, and this includes
> > termination of MAC addresses corresponding to the network interfaces in
> > the system (DSA switch ports, VLAN sub-interfaces, bridge interface).
> > The first 4 patches use Ivan Khoronzhuk's IVDF framework for extending
> > network interface addresses with a Virtual ID (typically VLAN ID). This
> > matches DSA switches perfectly because their FDB already contains keys
> > of the {DMAC, VID} form.
>
> Hi,
>
> I read through the series and I'm not sure how unicast filtering works.
> Instead of writing a very long mail I just created a script with
> comments. I think it's clearer that way. Note that this is not a made up
> configuration. It is used in setups involving VRRP / VXLAN, for example.
>
> ```
> #!/bin/bash
>
> ip netns add ns1
>
> ip -n ns1 link add name br0 type bridge vlan_filtering 1
> ip -n ns1 link add name dummy10 up type dummy
>
> ip -n ns1 link set dev dummy10 master br0
> ip -n ns1 link set dev br0 up
>
> ip -n ns1 link add link br0 name vlan10 up type vlan id 10
> bridge -n ns1 vlan add vid 10 dev br0 self
>
> echo "Before adding macvlan:"
> echo "======================"
>
> echo -n "Promiscuous mode: "
> ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
>
> echo -e "\nvlan10's MAC is in br0's FDB:"
> bridge -n ns1 fdb show br0 vlan 10
>
> echo
> echo "After adding macvlan:"
> echo "====================="
>
> ip -n ns1 link add link vlan10 name vlan10-v up address 00:00:5e:00:01:01 \
>         type macvlan mode private
>
> echo -n "Promiscuous mode: "
> ip -n ns1 -j -p -d link show dev br0 | jq .[][\"promiscuity\"]
>
> echo -e "\nvlan10-v's MAC is not in br0's FDB:"
> bridge -n ns1 fdb show br0 | grep master | grep 00:00:5e:00:01:01
> ```
>
> This is the output on my laptop (kernel 5.6.8):
>
> ```
> Before adding macvlan:
> ======================
> Promiscuous mode: 0
>
> vlan10's MAC is in br0's FDB:
> 42:bd:b1:cc:67:15 dev br0 vlan 10 master br0 permanent
>
> After adding macvlan:
> =====================
> Promiscuous mode: 1
>
> vlan10-v's MAC is not in br0's FDB:
> ```
>
> Basically, if the MAC of the VLAN device is not inherited from the
> bridge or you stack macvlans on top, then the bridge will go into
> promiscuous mode and it will locally receive all frames passing through
> it. It's not ideal, but it's a very old and simple behavior. It does not
> require you to track the VLAN associated with the MAC addresses, for
> example.
>

This is a good point. I wasn't aware that the bridge 'gives up' with
macvlan upper devices, but if I understand correctly, we do have the
necessary tools to improve that.
But actually, I'm wondering if this simple behavior from the bridge is
correct. As you, Jiri and Ivan pointed out in last summer's email
thread about the Linux bridge and promiscuous mode, putting the
interface in IFF_PROMISC is only going to guarantee acceptance through
the net device's RX filter, but not that the packets will go to the
CPU. So from that perspective, the current series would break things,
so we should definitely fix that and keep the {MAC, VLAN} pairs in the
bridge's local FDB.

> When you are offloading the Linux data path to hardware this behavior is
> not ideal as your hardware can handle much higher packet rates than the
> CPU.
>
> In mlxsw we handle this by tracking the upper devices of the bridge. I
> was hoping that with Ivan's patches we could add support for unicast
> filtering in the bridge driver and program the MAC addresses to its FDB
> with 'local' flag. Then the FDB entries would be notified via switchdev
> to device drivers.
>

Yes, it should be possible to do that. I'll try and see how far I get.

> >
> > Multicast filtering was taken and reworked from Florian Fainelli's
> > previous attempts, according to my own understanding of multicast
> > forwarding requirements of an IGMP snooping switch. This is the part
> > that needs the most extra work, not only in the DSA core but also in
> > drivers. For this reason, I've left out of this patchset anything that
> > has to do with driver-level configuration (since the audience is a bit
> > larger than usual), as I'm trying to focus more on policy for now, and
> > the series is already pretty huge.
>
> From what I remember, this is the logic in the Linux bridge:
>
> * Broadcast is always locally received
> * Multicast is locally received if:
>         * Snooping disabled
>         * Snooping enabled:
>                 * Bridge netdev is mrouter port
>                 or
>                 * Matches MDB entry with 'host_joined' indication
>
> >
> > Florian Fainelli (3):
> >   net: bridge: multicast: propagate br_mc_disabled_update() return
> >   net: dsa: add ability to program unicast and multicast filters for CPU
> >     port
> >   net: dsa: wire up multicast IGMP snooping attribute notification
> >
> > Ivan Khoronzhuk (4):
> >   net: core: dev_addr_lists: add VID to device address
> >   net: 8021q: vlan_dev: add vid tag to addresses of uc and mc lists
> >   net: 8021q: vlan_dev: add vid tag for vlan device own address
> >   ethernet: eth: add default vid len for all ethernet kind devices
> >
> > Vladimir Oltean (6):
> >   net: core: dev_addr_lists: export some raw __hw_addr helpers
> >   net: dsa: don't use switchdev_notifier_fdb_info in
> >     dsa_switchdev_event_work
> >   net: dsa: mroute: don't panic the kernel if called without the prepare
> >     phase
> >   net: bridge: add port flags for host flooding
> >   net: dsa: deal with new flooding port attributes from bridge
> >   net: dsa: treat switchdev notifications for multicast router connected
> >     to port
> >
> >  include/linux/if_bridge.h |   3 +
> >  include/linux/if_vlan.h   |   2 +
> >  include/linux/netdevice.h |  11 ++
> >  include/net/dsa.h         |  17 +++
> >  net/8021q/Kconfig         |  12 ++
> >  net/8021q/vlan.c          |   3 +
> >  net/8021q/vlan.h          |   2 +
> >  net/8021q/vlan_core.c     |  25 ++++
> >  net/8021q/vlan_dev.c      | 102 +++++++++++---
> >  net/bridge/br_if.c        |  40 ++++++
> >  net/bridge/br_multicast.c |  21 ++-
> >  net/bridge/br_switchdev.c |   4 +-
> >  net/core/dev_addr_lists.c | 144 +++++++++++++++----
> >  net/dsa/Kconfig           |   1 +
> >  net/dsa/dsa2.c            |   6 +
> >  net/dsa/dsa_priv.h        |  27 +++-
> >  net/dsa/port.c            | 155 ++++++++++++++++----
> >  net/dsa/slave.c           | 288 +++++++++++++++++++++++++++++++-------
> >  net/dsa/switch.c          |  36 +++++
> >  net/ethernet/eth.c        |  12 +-
> >  20 files changed, 780 insertions(+), 131 deletions(-)
> >
> > --
> > 2.25.1
> >

Thanks,
-Vladimir

Powered by blists - more mailing lists