[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52b437bc-3f0e-4a5e-ae18-aea6576eb1ad@gmail.com>
Date: Thu, 20 Mar 2025 17:14:19 -0400
From: Joseph Huang <joseph.huang.2024@...il.com>
To: Nikolay Aleksandrov <razor@...ckwall.org>,
Joseph Huang <Joseph.Huang@...min.com>, netdev@...r.kernel.org
Cc: Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Roopa Prabhu <roopa@...dia.com>, Simon Horman <horms@...nel.org>,
linux-kernel@...r.kernel.org, bridge@...ts.linux.dev
Subject: Re: [Patch net-next 0/3] Add support for mdb offload failure
notification
On 3/20/2025 2:17 AM, Nikolay Aleksandrov wrote:
> On 3/19/25 00:42, Joseph Huang wrote:
>> Currently the bridge does not provide real-time feedback to user space
>> on whether or not an attempt to offload an mdb entry was successful.
>>
>> This patch set adds support to notify user space about successful and
>> failed offload attempts, and the behavior is controlled by a new knob
>> mdb_notify_on_flag_change:
>>
>> 0 - the bridge will not notify user space about MDB flag change
>> 1 - the bridge will notify user space about flag change if either
>> MDB_PG_FLAGS_OFFLOAD or MDB_PG_FLAGS_OFFLOAD_FAILED has changed
>> 2 - the bridge will notify user space about flag change only if
>> MDB_PG_FLAGS_OFFLOAD_FAILED has changed
>>
>> The default value is 0.
>>
>> A break-down of the patches in the series:
>>
>> Patch 1 adds offload failed flag to indicate that the offload attempt
>> has failed. The flag is reflected in netlink mdb entry flags.
>>
>> Patch 2 adds the knob mdb_notify_on_flag_change, and notify user space
>> accordingly in br_switchdev_mdb_complete() when the result is known.
>>
>> Patch 3 adds netlink interface to manipulate mdb_notify_on_flag_change
>> knob.
>>
>> This patch set was inspired by the patch series "Add support for route
>> offload failure notifications" discussed here:
>> https://lore.kernel.org/all/20210207082258.3872086-1-idosch@idosch.org/
>>
>> Joseph Huang (3):
>> net: bridge: mcast: Add offload failed mdb flag
>> net: bridge: mcast: Notify on offload flag change
>> net: bridge: Add notify on flag change netlink i/f
>>
>> include/uapi/linux/if_bridge.h | 9 +++++----
>> include/uapi/linux/if_link.h | 14 ++++++++++++++
>> net/bridge/br_mdb.c | 30 +++++++++++++++++++++++++-----
>> net/bridge/br_multicast.c | 25 +++++++++++++++++++++++++
>> net/bridge/br_netlink.c | 21 +++++++++++++++++++++
>> net/bridge/br_private.h | 26 +++++++++++++++++++++-----
>> net/bridge/br_switchdev.c | 31 ++++++++++++++++++++++++++-----
>> 7 files changed, 137 insertions(+), 19 deletions(-)
>>
>
> Hi,
> Could you please share more about the motivation - why do you need this and
> what will be using it?
Hi Nik,
The API for a user space application to join a multicast group is
write-only (and really best-efforts only), meaning that after an
application calls setsockopt(), the application has no way to know
whether the operation actually succeeded or not. Normally for soft
bridges this is not an issue; however for switchdev-backed bridges, due
to limited hardware resources, the failure rate is meaningfully higher.
With this patch set, the user space application will now get a
notification about a failed attempt to join a multicast group. The user
space application can then have the opportunity to mitigate the failure
[1][2].
> Also why do you need an option with 3 different modes
> instead of just an on/off switch for these notifications?
>
> Thanks,
> Nik
>
Some user space application might be interested in both successful and
failed offload attempts (for example the application might want to keep
an mdb database which is perfectly in sync with the hardware), while
some other user space application might only be interested in failed
attempts (so that it can retry the operation or choose a different group
for example).
This knob is modeled after fib_notify_on_flag_change knob on route
offload failure notification (see
https://lore.kernel.org/all/20210207082258.3872086-4-idosch@idosch.org/).
The rationale is that "Separate value (read: 2) is added for such
notifications because there are less of them, so they do not impact
performance and some users will find them more important."
Thanks,
Joseph
--
[1]
https://datatracker.ietf.org/doc/draft-ietf-pim-zeroconf-mcast-addr-alloc-ps/,
section 2, the last paragraph
[2]
https://datatracker.ietf.org/doc/draft-ietf-pim-ipv6-zeroconf-assignment/,
section 2.1, the first paragraph
Powered by blists - more mailing lists