netdev - Re: [RFC PATCH net-next 00/15] Synchronous feedback on FDB add/del from switchdev to the bridge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211026112525.glv7n2fk27sjqubj@skbuf>
Date:   Tue, 26 Oct 2021 11:25:25 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Nikolay Aleksandrov <nikolay@...dia.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Roopa Prabhu <roopa@...dia.com>,
        Ido Schimmel <idosch@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Vladimir Oltean <olteanv@...il.com>,
        Jiri Pirko <jiri@...dia.com>
Subject: Re: [RFC PATCH net-next 00/15] Synchronous feedback on FDB add/del
 from switchdev to the bridge

On Tue, Oct 26, 2021 at 01:40:15PM +0300, Nikolay Aleksandrov wrote:
> Hi,
> Interesting way to work around the asynchronous notifiers. :) I went over
> the patch-set and given that we'll have to support and maintain this fragile
> solution (e.g. playing with locking, possible races with fdb changes etc) I'm
> inclined to go with Ido's previous proposition to convert the hash_lock into a mutex
> with delayed learning from the fast-path to get a sleepable context where we can
> use synchronous switchdev calls and get feedback immediately.

Delayed learning means that we'll receive a sequence of packets like this:

            br0--------\
          /    \        \
         /      \        \
        /        \        \
     swp0         swp1    swp2
      |            |        |
   station A   station B  station C

station A sends request to B, station B sends reply to A.
Since the learning of station A's MAC SA races with the reply sent by
station B, it now becomes theoretically possible for the reply packet to
be flooded to station C as well, right? And that was not possible before
(at least assuming an ageing time longer than the round-trip time of these packets).

And that will happen regardless of whether switchdev is used or not.
I don't want to outright dismiss this (maybe I don't fully understand
this either), but it seems like a pretty heavy-handed change.

> That would be the
> cleanest and most straight-forward solution, it'd be less error-prone and easier
> to maintain long term. I plan to convert the bridge hash_lock to a mutex and then
> you can do the synchronous switchdev change if you don't mind and agree of course.

I agree that there are races and implications I haven't fully thought of,
with this temporary dropping of the br->hash_lock. It doesn't appear ideal.

For example,

/* Delete an FDB entry and notify switchdev. */
static int __br_fdb_delete(struct net_bridge *br,
			   const struct net_bridge_port *p,
			   const u8 *addr, u16 vlan,
			   struct netlink_ext_ack *extack)
{
	struct br_switchdev_fdb_wait_ctx wait_ctx;
	struct net_bridge_fdb_entry *fdb;
	int err;

	br_switchdev_fdb_wait_ctx_init(&wait_ctx);

	spin_lock_bh(&br->hash_lock);

	fdb = br_fdb_find(br, addr, vlan);
	if (!fdb || READ_ONCE(fdb->dst) != p) {
		spin_unlock_bh(&br->hash_lock);
		return -ENOENT;
	}

	br_fdb_notify_async(br, fdb, RTM_DELNEIGH, extack, &wait_ctx);

	spin_unlock_bh(&br->hash_lock);

	err = br_switchdev_fdb_wait(&wait_ctx); <- at this stage (more comments below)
	if (err)
		return err;

	/* We've notified rtnl and switchdev once, don't do it again,
	 * just delete.
	 */
	return fdb_delete_by_addr_and_port(br, p, addr, vlan, false);
}

the software FDB still contains the entry, while the hardware doesn't.
And we are no longer holding the lock, so somebody can either add or
delete that entry.

If somebody else tries to concurrently add that entry, it should not
notify switchdev again because it will see that the FDB entry exists,
and we should finally end up deleting it and result in a consistent
state.

If somebody else tries to concurrently delete that entry, it will
probably be from a code path that ignores errors (because the code paths
that don't are serialized by the rtnl_mutex). Switchdev will say "hey,
but I don't have this FDB entry, you've just deleted it", but that will
again be fine.

There seems to be a problem if somebody concurrently deletes that entry,
_and_then_ it gets added back again, all that before we call
fdb_delete_by_addr_and_port(). Because we don't notify switchdev the
second time around, we'll end up with an address in hardware but no
software counterpart.

I don't really know how to cleanly deal with that.

> By the way patches 1-6 can stand on their own, feel free to send them separately. 

Thanks, I will.