[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<MW3PR15MB3913966CE6D332CFDC1982A8FABF2@MW3PR15MB3913.namprd15.prod.outlook.com>
Date: Fri, 18 Apr 2025 19:20:20 +0000
From: David Wilder <wilder@...ibm.com>
To: Jay Vosburgh <jv@...sburgh.net>
CC: Ilya Maximets <i.maximets@....org>,
"netdev@...r.kernel.org"
<netdev@...r.kernel.org>,
"pradeeps@...ux.vnet.ibm.com"
<pradeeps@...ux.vnet.ibm.com>,
Pradeep Satyanarayana <pradeep@...ibm.com>,
Adrian Moreno Zapata <amorenoz@...hat.com>,
Hangbin Liu <haliu@...hat.com>
Subject: RE: [PATCH net-next v1 1/1] bonding: Adding limmited support for ARP
monitoring with ovs.
>>>>>>>>> Adding limited support for the ARP Monitoring feature when ovs is
>>>>>>>>> configured above the bond. When no vlan tags are used in the configuration
>>>>>>>>> or when the tag is added between the bond interface and the ovs bridge arp
>>>>>>>>> monitoring will function correctly. The use of tags between the ovs bridge
>>>>>>>>> and the routed interface are not supported.
>>>>>>>>
>>>>>>>> Looking at the patch, it isn't really "adding support," but
>>>>>>>> rather is disabling the "is this IP address configured above the bond"
>>>>>>>> checks if the bond is a member of an OVS bridge. It also seems like it
>>>>>>>> would permit any ARP IP target, as long as the address is configured
>>>>>>>> somewhere on the system.
>>>>>>>>
>>>>>>>> Stated another way, the route lookup in bond_arp_send_all() for
>>>>>>>> the target IP address must return a device, but the logic to match that
>>>>>>>> device to the interface stack above the bond will always succeed if the
>>>>>>>> bond is a member of any OVS bridge.
>>>>>>>>
>>>>>>>> For example, given:
>>>>>>>>
>>>>>>>> [ eth0, eth1 ] -> bond0 -> ovs-br -> ovs-port IP=10.0.0.1
>>>>>>>> eth2 IP=20.0.0.2
>>>>>>>>
>>>>>>>> Configuring arp_ip_target=20.0.0.2 on bond0 would apparently
>>>>>>>> succeed after this patch is applied, and the bond would send ARPs for
>>>>>>>> 20.0.0.2.
>>>>>>>>
>>>>>>>>> For example:
>>>>>>>>> 1) bond0 -> ovs-br -> ovs-port (x.x.x.x) is supported
>>>>>>>>> 2) bond0 -> bond0.100 -> ovs-br -> ovs-port (x.x.x.x) is supported.
>>>>>>>>> 3) bond0 -> ovs-br -> ovs-port -> ovs-port.100 (x.x.x.x) is not supported.
>>>>>>>>>
>>>>>>>>> Configurations #1 and #2 were tested and verified to function corectly.
>>>>>>>>> In the second configuration the correct vlan tags were seen in the arp.
>>>>>>>>
>>>>>>>> Assuming that I'm understanding the behavior correctly, I'm not
>>>>>>>> sure that "if OVS then do whatever" is the right way to go, particularly
>>>>>>>> since this would still exhibit mysterious failures if a VLAN is
>>>>>>>> configured within OVS itself (case 3, above).
>>>>>>>
>>>>>>> Note: vlan can also be pushed or removed by the OpenFlow pipeline within
>>>>>>> openvswitch between the ovs-port and the bond0. So, there is actually no
>>>>>>> reliable way to detect the correct set of vlan tags that should be used.
>>>>>>> And also, even if the IP is assigned to the ovs-port that is part of the
>>>>>>> same OVS bridge, there is no guarantee that packets routed to that IP can
>>>>>>> actually egress from the bond0, as the forwarding rules inside the OVS
>>>>>>>datapath can be arbitrarily complex.
>>>>>>>
>>>>>>> And all that is not limited to OVS even, as the cover letter mentions, TC
>>>>>>> or nftables in the linux bridge or even eBPF or XDP programs are not that
>>>>>>> different complexity-wise and can do most of the same things breaking the
>>>>>>> assumptions bonding code makes.
>>>>>>>
>>>>>>>>
>>>>>>>> I understand that the architecture of OVS limits the ability to
>>>>>>>> do these sorts of checks, but I'm unconvinced that implementing this
>>>>>>>> support halfway is going to create more issues than it solves.
>>>>>
>>>>> Re-reading my comment, I clearly meant "isn't going to create
>>>>> more issues" here.
>>>>>
>>>>>>>> Lastly, thinking out loud here, I'm generally loathe to add more
>>>>>>>> options to bonding, but I'm debating whether this would be worth an
>>>>>>>> "ovs-is-a-black-box" option somewhere, so that users would have to
>>>>>>>> opt-in to the OVS alternate realm.
>>>>>>
>>>>>>> I agree that adding options is almost never a great solution. But I had a
>>>>>>> similar thought. I don't think this option should be limited to OVS though,
>>>>>>>as OVS is only one of the cases where the current verification logic is not
>>>>>>>sufficient.
>>>>>
>>>>> Agreed; I wasn't really thinking about the not-OVS cases when I
>>>>>wrote that, but whatever is changed, if anything, should be generic.
>>>>
>>>>>>What if we build on the arp_ip_target setting. Allow for a list of vlan tags
>>>>>> to be appended to each target. Something like: arp_ip_target=x.x.x.x[vlan,vlan,...].
>>>>>> If a list of tags is omitted it works as before, if a list is supplied assume we know what were doing
>>>>>> and use that instead of calling bond_verify_device_path(). An empty list would be valid.
>>>>
>>>>> Hmm, that's ... not too bad; I was thinking more along the lines
>>>>>of a "skip the checks" option, but this may be a much cleaner way to do
>>>>>it.
>>>>
>>>>> That said, are you thinking that bonding would add the VLAN
>>>>>tags, or that some third party would add them? I.e., are you trying to
>>>>>accomodate the case wherein OVS, tc, or whatever, is adding a tag?
>>>>
>>>>It would be up to the administrator to add the list of tags to the
>>>>arp_target list. I am unsure how a third party could know what tags
>>>>might be added by other components. Knowing what tags to add to the
>>>>list may be hard to figure out in a complicated configuration. The
>>>>upside is it should be possible to configure for any list of tags even
>>>>if difficult.
>>>
>>> I suspect I wasn't clear in my question; what I'm asking is what
>>>you envision for the implementation within bonding for the "[vlan,vlan]"
>>>part, and by "third party," I mean "not bonding," so OVS, tc, etc.
>>>
>>> Would bonding need to add the tags in the list, or expect one of
>>>the thiry parties to do it, or some kind of mix?
>>
>>Bonding needs to add all the tags. I am just optionally replacing
>>the collection of tags by bond_verify_device_path() with a list of tags
>>supplied in the arp_target list. (Optional Per target).
>>
>>To be clear, if you chose to supply tags manually, you need to supply
>>all the tags needed for that target, not just the tags bonding could not
>>figure out on its own. An empty list [] would be valid and would just cause
>>bonding to skip tag collection.
>>
>>If OVS adds a tag it would be up to the user to know that and update
>>the bonding configuration to include all tags for the target.
>
> If OVS adds a tag, and the ARP traverses through OVS, why would
>bonding need to add that tag? Wouldn't OVS add the tag again?
>
I though the arp probes are sent directly out from bonding, so it wont
pass through OVS, therefore bonding needs to add all the tags to the probes.
>>> Does bonding need to know what tags any third party adds? I.e.,
>>>for your case 3, above, wherein OVS adds a tag, why does that fail to
>>>work?
>>
>>Today it fails since bond_verify_device_path() cant see the tags
>>added by or above OVS. Adding a list of tags [vlan vlan] or [] would effectively
>>do the the same as the "skip the checks" option. But allows a way to make
>>case 3 work.
>>
>>>
>>>>A "skip the checks" option would be easier to code. If we skip the
>>>>process of gathering tags would that eliminate any configurations with
>>>>any vlan tags?.
>>>
>>> So, yes, it would be easier to implement, and, no, I think a
>>>simple "skip the checks" switch could be implemented such that it still
>>>runs the device path and gathers the tags, but doesn't care if the end
>>>of the device path matches.
>>>
>>> That said, such an implementation is effectively the same as
>>>your original patch, except with an option instead of an OVS-ness check,
>>>and I'm still undecided on whether that's better than something that
>>>requires more specific configuration.
>>
>>Ah, ok I get it.
>>
>>The "skip the checks" option has the advantage in simplicity and will
>>fix the problem I started out solving. The downside is it wont solve more
>>complex configurations Ilya is concerned with (see case 3). I am all for starting
>>with the "kiss" approach, we can always pivot to something more complex later if we have
>>the demand.
>
> Modulo some of the implementation details from above, I'm more
>inclined at the moment towards the "specify the tags" solution (specify
>all the tags), rather than the "skip the checks" option.
>
> The reason is that, once we add an option, it generally cannot
>ever be removed, and so there isn't really a "pivot" in the sense that
>an existing option would ever go away. In that case, the only way
>forward would be to add another option (the "specify the tags" one), and
>now we've got two different options for the same thing that work
>differently. I'd like to avoid that.
Good point, Agreed, "specify the tags" is the way to go.
As a starting point I am thinking:
+struct arp_target {
+ __be32 target_ip;
+ struct bond_vlan_tag *tags;
+};
+
struct bond_params {
int mode;
int xmit_policy;
dd@@ -135,7 +140,7 @@ struct bond_params {
int ad_select;
char primary[IFNAMSIZ];
int primary_reselect;
- __be32 arp_targets[BOND_MAX_ARP_TARGETS];
+ struct arp_target arp_targets[BOND_MAX_ARP_TARGETS];
int tx_queues;
int all_slaves_active;
int resend_igmp;
Parse the list of address and tags into the array of struct arp_target.
Then bond_verify_device_path() will return arp_targets[i]->tags if it exists
or build its own list of tags as it always did.
>
>-J
>
>>>
>>>-J
>>
>>
>>>>>>>>> Signed-off-by: David J Wilder <wilder@...ibm.com>
>>>>>>>>> Signed-off-by: Pradeep Satyanarayana <pradeeps@...ux.vnet.ibm.com>
>>>>>>>>> ---
>>>>>>>>> drivers/net/bonding/bond_main.c | 8 +++++++-
>>>>>>>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>>>>>>> index 950d8e4d86f8..6f71a567ba37 100644
>>>>>>>>> --- a/drivers/net/bonding/bond_main.c
>>>>>>>>> +++ b/drivers/net/bonding/bond_main.c
>>>>>>>>> @@ -3105,7 +3105,13 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
>>>>>>>>> struct net_device *upper;
>>>>>>>>> struct list_head *iter;
>>>>>>>>>
>>>>>>>>> - if (start_dev == end_dev) {
>>>>>>>>> + /* If start_dev is an OVS port then we have encountered an openVswitch
>>>>>>>
>>>>>>> nit: Strange choice to capitalize 'V'. It should be all lowercase or a full
>>>>>>> 'Open vSwitch' instead.
>>>>>>
>>>>>>>>> + * bridge and can't go any further. The programming of the switch table
>>>>>>>>> + * will determine what packets will be sent to the bond. We can make no
>>>>>>>>> + * further assumptions about the network above the bond.
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>> + if (start_dev == end_dev || netif_is_ovs_port(start_dev)) {
>>>>>>>>> tags = kcalloc(level + 1, sizeof(*tags), GFP_ATOMIC);
>>>>>>>>> if (!tags)
>>>>>>>>> return ERR_PTR(-ENOMEM);
>>>>>>>>
>>>>>>>> ---
>>>>>>>> -Jay Vosburgh, jv@...sburgh.net
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
>>>>>>
>>>>>>David Wilder (wilder@...ibm.com)
>>>>>
>>>>>---
>>>>> -Jay Vosburgh, jv@...sburgh.net
>>>>
>>>>David Wilder (wilder@...ibm.com)
>>>
>>>---
>>> -Jay Vosburgh, jv@...sburgh.net
>>
>
>---
> -Jay Vosburgh, jv@...sburgh.net
David Wilder (wilder@...ibm.com)
Powered by blists - more mailing lists