[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b5741b6-48c0-0c34-aed8-257f3e203ac5@nvidia.com>
Date: Tue, 26 Jan 2021 16:55:22 +0200
From: Nikolay Aleksandrov <nikolay@...dia.com>
To: Hangbin Liu <liuhangbin@...il.com>
CC: <netdev@...r.kernel.org>, Roopa Prabhu <roopa@...dia.com>,
"David S . Miller" <davem@...emloft.net>,
<bridge@...ts.linux-foundation.org>,
Jarod Wilson <jarod@...hat.com>, Jiri Pirko <jiri@...nulli.us>,
Ivan Vecera <ivecera@...hat.com>,
Stephen Hemminger <stephen@...workplumber.org>,
Ido Schimmel <idosch@...sch.org>
Subject: Re: [PATCH net-next] bridge: Propagate NETDEV_NOTIFY_PEERS notifier
On 26/01/2021 15:56, Nikolay Aleksandrov wrote:
> On 26/01/2021 15:25, Hangbin Liu wrote:
>> On Tue, Jan 26, 2021 at 09:40:13AM +0200, Nikolay Aleksandrov wrote:
>>> On 26/01/2021 06:09, Hangbin Liu wrote:
>>>> After adding bridge as upper layer of bond/team, we usually clean up the
>>>> IP address on bond/team and set it on bridge. When there is a failover,
>>>> bond/team will not send gratuitous ARP since it has no IP address.
>>>> Then the down layer(e.g. VM tap dev) of bridge will not able to receive
>>>> this notification.
>>>>
>>>> Make bridge to be able to handle NETDEV_NOTIFY_PEERS notifier.
>>>>
>>>> Signed-off-by: Hangbin Liu <liuhangbin@...il.com>
>>>> ---
>>>> net/bridge/br.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/net/bridge/br.c b/net/bridge/br.c
>>>> index ef743f94254d..b6a0921bb498 100644
>>>> --- a/net/bridge/br.c
>>>> +++ b/net/bridge/br.c
>>>> @@ -125,6 +125,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
>>>> /* Forbid underlying device to change its type. */
>>>> return NOTIFY_BAD;
>>>>
>>>> + case NETDEV_NOTIFY_PEERS:
>>>> case NETDEV_RESEND_IGMP:
>>>> /* Propagate to master device */
>>>> call_netdevice_notifiers(event, br->dev);
>>>>
>>>
>>> I'm not convinced this should be done by the bridge, setups usually have multiple ports
>>> which may have link change events and these events are unrelated, i.e. we shouldn't generate
>>> a gratuitous arp for all every time, there might be many different devices present. We have
>>> setups with hundreds of ports which are mixed types of devices.
>>> That seems inefficient, redundant and can potentially cause problems.
>>
>> Hi Nikolay,
>>
>> Thanks for the reply. There are a few reasons I think the bridge should
>> handle NETDEV_NOTIFY_PEERS:
>>
>> 1. Only a few devices will call NETDEV_NOTIFY_PEERS notifier: bond, team,
>> virtio, xen, 6lowpan. There should have no much notification message.
>
> You can't send a broadcast to all ports because 1 bond's link status has changed.
> That makes no sense, the GARP needs to be sent only on that bond. The bond devices
> are heavily used with bridge setups, and in general the bridge is considered a switch
> device, it shouldn't be broadcasting GARPs to all ports when one changes link state.
>
Scratch the last sentence, I guess you're talking about when the bond's mac causes
the bridge to change mac address by br_stp_recalculate_bridge_id(). I was wondering
at first why would you need to send garp, but in fact, as Ido mentioned privately,
it is already handled correctly, but you need to have set arp_notify sysctl.
Then if the bridge's mac changes because of the bond flapping a NETDEV_NOTIFY_PEERS will be
generated. Check:
devinet.c inetdev_event() -> case NETDEV_CHANGEADDR
Alternatively you can always set the bridge mac address manually and then it won't be
changed by such events.
>> 2. When set bond/team's upper layer to bridge. The bridge's mac will be the
>> same with bond/team. So when the bond/team's mac changed, the bridge's mac
>> will also change. So bridge should send a GARP to notify other's that it's
>> mac has changed.
>
> That is not true, the mac doesn't need to be the same at all. And in many
> situations isn't.
>
>> 3. There already has NETDEV_RESEND_IGMP handling in bridge, which is also
>> generated by bond/team and netdev_notify_peers(). So why there is IGMP
>> but no ARP?
>
> Apples and oranges..
>
>> 4. If bridge doesn't have IP address, it will omit GARP sending. So having
>> or not having IP address on bridge doesn't matters.
>> 4. I don't see why how many ports affect the bridge sending GARP.
>
> Bridge broadcasts are notoriously slow, they consider every port. We've seen glean
> traffic take up 100% CPU with only 10k pps. I have patches that fix the situation for
> *some* cases (i.e. where not all ports need to be considered), but in general you can't
> optimize it much, so it's best to avoid sending them altogether.
> Just imagine having a hundred SVIs on top of the bridge, that would lead to number if SVIs
> multipled by the number of ports broadcast packets for each link flap of some bond/team port.
> Same thing happens if there are macvlans on top, we have setups with thousands of virtual devices
> and this will just kill them, if it was at all correct behaviour then we might look for a solution
> but it is not in general. GARPs must be confined only to the bond ports which changed state, and
> not broadcast to all every time.
Again scratch the last part, I misunderstood why you need garps at first.
>
>>
>> Please correct me if I missed something.
>>
>>> Also it seems this was proposed few years back: https://lkml.org/lkml/2018/1/6/135
>>
>> Thanks for this link, cc Stephen for this discuss.
>>
>> Hangbin
>>
>
>
Powered by blists - more mailing lists