[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <531e75e8-d5d1-407b-d665-aec2a66bf432@nvidia.com>
Date: Tue, 26 Oct 2021 13:40:15 +0300
From: Nikolay Aleksandrov <nikolay@...dia.com>
To: Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org,
Roopa Prabhu <roopa@...dia.com>,
Ido Schimmel <idosch@...dia.com>
Cc: Jakub Kicinski <kuba@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Vivien Didelot <vivien.didelot@...il.com>,
Vladimir Oltean <olteanv@...il.com>,
Jiri Pirko <jiri@...dia.com>
Subject: Re: [RFC PATCH net-next 00/15] Synchronous feedback on FDB add/del
from switchdev to the bridge
On 26/10/2021 01:24, Vladimir Oltean wrote:
> Hello, this is me bombarding the list with switchdev FDB changes again.
>
> This series attempts to address one design limitation in the interaction
> between the bridge and switchdev: error codes returned from the
> SWITCHDEV_FDB_ADD_TO_DEVICE and SWITCHDEV_FDB_DEL_TO_DEVICE handlers are
> completely ignored.
>
> There are multiple aspects to that. First of all, drivers have a portion
> that handles those switchdev events in atomic context, and a portion
> that handles them in a private deferred work context. Errors reported
> from both calling contexts are ignored by the bridge, and it is
> desirable to actually propagate both to user space.
>
> Secondly, it is in fact okay that some switchdev errors are ignored.
> The call graph for fdb_notify() is not simple, it looks something like
> this (not complete):
>
> IFLA_BRPORT_FLUSH RTM_NEWNEIGH
> | |
> | {br,nbp}_vlan_delete br_fdb_change_mac_address v
> | | | | fast __br_fdb_add
> | | | del_nbp, br_dev_delete br_fdb_changeaddr | path / | \
> | | | | | | learning / | \
> \ | -------------------- br_fdb_find_delete_local | | | / | \ switchdev event
> \ | | | | | | / | \ listener
> -------------------------- br_fdb_delete_by_port | | | | / | \ |
> | | | | | | / | \ |
> | | | | | | / | \ |
> | | | | | br_fdb_update | br_fdb_external_learn_add
> (RTM_DELNEIGH) br_fdb_delete | | | | | | | |
> | | | | | | | | | gc_work netdevice
> | | | | | | | fdb_add_entry | timer event
> | | fdb_delete_local | | | | listener
> __br_fdb_delete | | | | / br_fdb_cleanup |
> | | | | | / | | br_stp_change_bridge_id
> | | | \ | / | br_fdb_changeaddr |
> | | | \ | / | | |
> fdb_delete_by_addr_and_port | | fdb_insert \ | / ----/ | br_fdb_change_mac_address
> | | | | \ | / / | |
> br_fdb_external_learn_del | | | | br_fdb_cleanup \ | / / | | br_fdb_insert
> | | | | | | \ | / ----/ | | |
> | | | | | | \ | / / fdb_insert
> br_fdb_flush | | | | | | \ | / / --------/
> \---- | | | | | | \ | / / ------/
> \----------- fdb_delete --------------- fdb_notify ---------/
>
> There's not a lot that the fast path learning can do about switchdev
> when that returns an error.
>
> So this patch set mainly wants to deal with the 2 code paths that are
> triggered by these regular commands:
>
> bridge fdb add dev swp0 00:01:02:03:04:05 master static # __br_fdb_add
> bridge fdb del dev swp0 00:01:02:03:04:05 master static # __br_fdb_delete
>
> In some other, semi-related discussions, Ido Schimmel pointed out that
> it would be nice if user space got some feedback from the actual driver,
> and made some proposals about how that could be done.
> https://patchwork.kernel.org/project/netdevbpf/cover/20210819160723.2186424-1-vladimir.oltean@nxp.com/
> One of the proposals was to call fdb_notify() from sleepable context,
> but Nikolay disliked the idea of introducing deferred work in the bridge
> driver (seems like nobody wants to deal with it).
>
> And since all proposals of dealing with the deferred work inside
> switchdev were also shot down for valid reasons, we are basically left
> as a baseline with the code that we have today, with the deferred work
> being private to the driver, and somehow we must propagate an err and an
> extack from there.
>
> So the approach taken here is to reorganize the code a bit and add some
> hooks in:
> (a) some callers of the fdb_notify() function to initialize a completion
> structure
> (b) some drivers that catch SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE and mark
> that completion structure as done
> (c) some bridge logic that I believe is fairly safe (I'm open to being
> proven wrong) that temporarily drops the &br->hash_lock in order to
> sleep until the completion is done.
>
> There are some further optimizations that can be made. For example, we
> can avoid dropping the hash_lock if there is no switchdev response pending.
> And we can move some of that completion logic in br_switchdev.c such
> that it is compiled out on a CONFIG_NET_SWITCHDEV=n build. I haven't
> done those here, since they aren't exactly trivial. Mainly searching for
> high-level feedback first and foremost.
>
> The structure of the patch series is:
> - patches 1-6 are me toying around with some code organization while I
> was trying to understand the various call paths better. I like not
> having forward declarations, but if they exist for a reason, I can
> drop these patches.
> - patches 7-10 and 12 are some preparation work that can also be ignored.
> - patches 11 and 13 are where the meat of the series is.
> - patches 14 and 15 are DSA boilerplate so I could test what I'm doing.
>
Hi,
Interesting way to work around the asynchronous notifiers. :) I went over
the patch-set and given that we'll have to support and maintain this fragile
solution (e.g. playing with locking, possible races with fdb changes etc) I'm
inclined to go with Ido's previous proposition to convert the hash_lock into a mutex
with delayed learning from the fast-path to get a sleepable context where we can
use synchronous switchdev calls and get feedback immediately. That would be the
cleanest and most straight-forward solution, it'd be less error-prone and easier
to maintain long term. I plan to convert the bridge hash_lock to a mutex and then
you can do the synchronous switchdev change if you don't mind and agree of course.
By the way patches 1-6 can stand on their own, feel free to send them separately.
Thanks,
Nik
Powered by blists - more mailing lists