netdev - Re: [RFC PATCH net-next 00/15] Synchronous feedback on FDB add/del from switchdev to the bridge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211026165424.djjy5xludtcqyqj2@skbuf>
Date:   Tue, 26 Oct 2021 16:54:25 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Nikolay Aleksandrov <nikolay@...dia.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Roopa Prabhu <roopa@...dia.com>,
        Ido Schimmel <idosch@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Vladimir Oltean <olteanv@...il.com>,
        Jiri Pirko <jiri@...dia.com>
Subject: Re: [RFC PATCH net-next 00/15] Synchronous feedback on FDB add/del
 from switchdev to the bridge

On Tue, Oct 26, 2021 at 03:20:03PM +0300, Nikolay Aleksandrov wrote:
> On 26/10/2021 14:25, Vladimir Oltean wrote:
> > On Tue, Oct 26, 2021 at 01:40:15PM +0300, Nikolay Aleksandrov wrote:
> >> Hi,
> >> Interesting way to work around the asynchronous notifiers. :) I went over
> >> the patch-set and given that we'll have to support and maintain this fragile
> >> solution (e.g. playing with locking, possible races with fdb changes etc) I'm
> >> inclined to go with Ido's previous proposition to convert the hash_lock into a mutex
> >> with delayed learning from the fast-path to get a sleepable context where we can
> >> use synchronous switchdev calls and get feedback immediately.
> > 
> > Delayed learning means that we'll receive a sequence of packets like this:
> > 
> >             br0--------\
> >           /    \        \
> >          /      \        \
> >         /        \        \
> >      swp0         swp1    swp2
> >       |            |        |
> >    station A   station B  station C
> > 
> > station A sends request to B, station B sends reply to A.
> > Since the learning of station A's MAC SA races with the reply sent by
> > station B, it now becomes theoretically possible for the reply packet to
> > be flooded to station C as well, right? And that was not possible before
> > (at least assuming an ageing time longer than the round-trip time of these packets).
> > 
> > And that will happen regardless of whether switchdev is used or not.
> > I don't want to outright dismiss this (maybe I don't fully understand
> > this either), but it seems like a pretty heavy-handed change.
> > 
> 
> It will depending on lock contention, I plan to add a fast/uncontended case with
> trylock from fast-path and if that fails then queue the fdb, but yes - in general

I wonder why mutex_trylock has this comment?

 * This function must not be used in interrupt context. The
 * mutex must be released by the same task that acquired it.

> you are correct that the traffic could get flooded in the queue case before the delayed
> learning processes the entry, it's a trade off if we want sleepable learning context.
> Ido noted privately that's usually how hw acts anyway, also if people want guarantees
> that the reply won't get flooded there are other methods to achieve that (ucast flood
> disable, firewall rules etc).

Not all hardware is like that, the switches I'm working with, which
perform autonomous learning, all complete the learning process for a
frame strictly before they start the forwarding process. The software
bridge also behaves like that. My only concern is that we might start
building on top of some fundamental bridge changes like these, which
might risk a revert a few months down the line, when somebody notices
and comes with a use case where that is not acceptable.

> Today the reply could get flooded if the entry can't be programmed
> as well, e.g. the atomic allocation might fail and we'll flood it again, granted it's much less likely
> but still there haven't been any such guarantees. I think it's generally a good improvement and
> will simplify a lot of processing complexity. We can bite the bullet and get the underlying delayed
> infrastructure correct once now, then the locking rules and other use cases would be easier to enforce
> and reason about in the future.

You're the maintainer, I certainly won't complain if we go down this path.
It would be nice if br->lock can also be transformed into a mutex, it
would make all of switchdev much simpler.