[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<MW3PR15MB39138C432D2CD843C20C0C10FAB22@MW3PR15MB3913.namprd15.prod.outlook.com>
Date: Tue, 15 Apr 2025 22:13:18 +0000
From: David Wilder <wilder@...ibm.com>
To: Jay Vosburgh <jv@...sburgh.net>
CC: Ilya Maximets <i.maximets@....org>,
"netdev@...r.kernel.org"
<netdev@...r.kernel.org>,
"pradeeps@...ux.vnet.ibm.com"
<pradeeps@...ux.vnet.ibm.com>,
Pradeep Satyanarayana <pradeep@...ibm.com>,
Adrian Moreno Zapata <amorenoz@...hat.com>,
Hangbin Liu <haliu@...hat.com>
Subject: RE: [PATCH net-next v1 1/1] bonding: Adding limmited support for ARP
monitoring with ovs.
>>>>> Adding limited support for the ARP Monitoring feature when ovs is
>>>>> configured above the bond. When no vlan tags are used in the configuration
>>>>> or when the tag is added between the bond interface and the ovs bridge arp
>>>>> monitoring will function correctly. The use of tags between the ovs bridge
>>>>> and the routed interface are not supported.
>>>>
>>>> Looking at the patch, it isn't really "adding support," but
>>>> rather is disabling the "is this IP address configured above the bond"
>>>> checks if the bond is a member of an OVS bridge. It also seems like it
>>>> would permit any ARP IP target, as long as the address is configured
>>>> somewhere on the system.
>>>>
>>>> Stated another way, the route lookup in bond_arp_send_all() for
>>>> the target IP address must return a device, but the logic to match that
>>>> device to the interface stack above the bond will always succeed if the
>>>> bond is a member of any OVS bridge.
>>>>
>>>> For example, given:
>>>>
>>>> [ eth0, eth1 ] -> bond0 -> ovs-br -> ovs-port IP=10.0.0.1
>>>> eth2 IP=20.0.0.2
>>>>
>>>> Configuring arp_ip_target=20.0.0.2 on bond0 would apparently
>>>> succeed after this patch is applied, and the bond would send ARPs for
>>>> 20.0.0.2.
>>>>
>>>>> For example:
>>>>> 1) bond0 -> ovs-br -> ovs-port (x.x.x.x) is supported
>>>>> 2) bond0 -> bond0.100 -> ovs-br -> ovs-port (x.x.x.x) is supported.
>>>>> 3) bond0 -> ovs-br -> ovs-port -> ovs-port.100 (x.x.x.x) is not supported.
>>>>>
>>>>> Configurations #1 and #2 were tested and verified to function corectly.
>>>>> In the second configuration the correct vlan tags were seen in the arp.
>>>>
>>>> Assuming that I'm understanding the behavior correctly, I'm not
>>>> sure that "if OVS then do whatever" is the right way to go, particularly
>>>> since this would still exhibit mysterious failures if a VLAN is
>>>> configured within OVS itself (case 3, above).
>>>
>>> Note: vlan can also be pushed or removed by the OpenFlow pipeline within
>>> openvswitch between the ovs-port and the bond0. So, there is actually no
>>> reliable way to detect the correct set of vlan tags that should be used.
>>> And also, even if the IP is assigned to the ovs-port that is part of the
>>> same OVS bridge, there is no guarantee that packets routed to that IP can
>>> actually egress from the bond0, as the forwarding rules inside the OVS
>>>datapath can be arbitrarily complex.
>>>
>>> And all that is not limited to OVS even, as the cover letter mentions, TC
>>> or nftables in the linux bridge or even eBPF or XDP programs are not that
>>> different complexity-wise and can do most of the same things breaking the
>>> assumptions bonding code makes.
>>>
>>>>
>>>> I understand that the architecture of OVS limits the ability to
>>>> do these sorts of checks, but I'm unconvinced that implementing this
>>>> support halfway is going to create more issues than it solves.
>
> Re-reading my comment, I clearly meant "isn't going to create
> more issues" here.
>
>>>> Lastly, thinking out loud here, I'm generally loathe to add more
>>>> options to bonding, but I'm debating whether this would be worth an
>>>> "ovs-is-a-black-box" option somewhere, so that users would have to
>>>> opt-in to the OVS alternate realm.
>>
>>> I agree that adding options is almost never a great solution. But I had a
>>> similar thought. I don't think this option should be limited to OVS though,
>>>as OVS is only one of the cases where the current verification logic is not
>>>sufficient.
>
> Agreed; I wasn't really thinking about the not-OVS cases when I
>wrote that, but whatever is changed, if anything, should be generic.
>>What if we build on the arp_ip_target setting. Allow for a list of vlan tags
>> to be appended to each target. Something like: arp_ip_target=x.x.x.x[vlan,vlan,...].
>> If a list of tags is omitted it works as before, if a list is supplied assume we know what were doing
>> and use that instead of calling bond_verify_device_path(). An empty list would be valid.
> Hmm, that's ... not too bad; I was thinking more along the lines
>of a "skip the checks" option, but this may be a much cleaner way to do
>it.
> That said, are you thinking that bonding would add the VLAN
>tags, or that some third party would add them? I.e., are you trying to
>accomodate the case wherein OVS, tc, or whatever, is adding a tag?
It would be up to the administrator to add the list of tags to the arp_target list.
I am unsure how a third party could know what tags might be added by other components.
Knowing what tags to add to the list may be hard to figure out in a complicated configuration.
The upside is it should be possible to configure for any list of tags even if difficult.
A "skip the checks" option would be easier to code. If we skip the process of gathering tags
would that eliminate any configurations with any vlan tags?.
>
> -J
>
>>>>
>>>> -J
>>>>
>>>>> Signed-off-by: David J Wilder <wilder@...ibm.com>
>>>>> Signed-off-by: Pradeep Satyanarayana <pradeeps@...ux.vnet.ibm.com>
>>>>> ---
>>>>> drivers/net/bonding/bond_main.c | 8 +++++++-
>>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>>> index 950d8e4d86f8..6f71a567ba37 100644
>>>>> --- a/drivers/net/bonding/bond_main.c
>>>>> +++ b/drivers/net/bonding/bond_main.c
>>>>> @@ -3105,7 +3105,13 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
>>>>> struct net_device *upper;
>>>>> struct list_head *iter;
>>>>>
>>>>> - if (start_dev == end_dev) {
>>>>> + /* If start_dev is an OVS port then we have encountered an openVswitch
>>>
>>> nit: Strange choice to capitalize 'V'. It should be all lowercase or a full
>>> 'Open vSwitch' instead.
>>
>>>>> + * bridge and can't go any further. The programming of the switch table
>>>>> + * will determine what packets will be sent to the bond. We can make no
>>>>> + * further assumptions about the network above the bond.
>>>>> + */
>>>>> +
>>>>> + if (start_dev == end_dev || netif_is_ovs_port(start_dev)) {
>>>>> tags = kcalloc(level + 1, sizeof(*tags), GFP_ATOMIC);
>>>>> if (!tags)
>>>>> return ERR_PTR(-ENOMEM);
>>>>
>>>> ---
>>>> -Jay Vosburgh, jv@...sburgh.net
>>>
>>> Best regards, Ilya Maximets.
>>
>>David Wilder (wilder@...ibm.com)
>
>---
> -Jay Vosburgh, jv@...sburgh.net
David Wilder (wilder@...ibm.com)
Powered by blists - more mailing lists