[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG=2xmOVUrmBVF+ORfR0rO=nP0t7aDkPxcrzd0sp0FBT9fqBKw@mail.gmail.com>
Date: Tue, 17 Sep 2024 09:10:11 +0000
From: Adrián Moreno <amorenoz@...hat.com>
To: Jay Vosburgh <jv@...sburgh.net>
Cc: Hangbin Liu <liuhangbin@...il.com>, netdev@...r.kernel.org,
Andy Gospodarek <andy@...yhouse.net>, "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
Nikolay Aleksandrov <razor@...ckwall.org>, Simon Horman <horms@...nel.org>,
Aaron Conole <aconole@...hat.com>, Ilya Maximets <i.maximets@....org>,
Stanislas Faye <sfaye@...hat.com>
Subject: Re: [Discuss] ARP monitor for OVS bridge over bonding
On Thu, Sep 12, 2024 at 09:36:13AM GMT, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@...il.com> wrote:
>
> >Hi all,
> >
> >Recently, our customer got an issue with OVS bridge over bonding. e.g.
> >
> > eth0 eth1
> > | |
> > -- bond0 --
> > |
> > br-ex (ovs-vsctl add-port br-ex bond0; ip addr add 192.168.1.1/24 dev br-ex)
> >
> >
> >Before sending arp message for bond slave detecting, the bond need to check
> >if the br-ex is in the same data path with bond0 via function
> >bond_verify_device_path(), which using netdev_for_each_upper_dev_rcu()
> >to check all upper devices. This works with normal bridge. But with ovs
> >bridge, the upper device is "ovs-system" instead of br-ex.
> >
> >After talking with OVS developers. It turned out the real upper OVS topology
> >is looks like
> >
> > --------------------------------
> > | |
> > br-ex -----+-- ovs-system |
> > | |
> > br-int -----+-- |
> > | |
> > | bond0 eth2 veth42 |
> > | | | | |
> > | | | | |
> > -------+-------+-------+--------
> > | | |
> > +--+--+ physical |
> > | | link |
> > eth0 eth1 veth43
> >
> >The br-ex is not upper link of bond0. ovs-system, instead, is the master
> >of bond0. This make us unable to make sure the br-ex and bond0 is in the
> >same datapath.
>
> I'm guessing that this is in the context of an openstack
> deployment, as "br-ex" and "br-int" are names commonly chosen for the
> OVS bridges in openstack.
>
> But, yes, OVS bridge configuration is very different from the
> linux bridge, and the ARP monitor was not designed with OVS in mind.
>
> I'll also point out that OVS has its own bonding, although it
> does not implement functionality equivalent to the ARP monitor.
>
> However, OVS does provide an implementation of RFC 5880 BFD
> (Bidirectional Forwarding Detection). The openstack deployments that
> I'm familiar with typically use the kernel bonding in LACP mode along
> with BFD. Is there a reason that OVS + BFD is unsuitable for your
> purposes?
>
> >On the other hand, as Adrián Moreno said, the packets generated on br-ex
> >could be routed anywhere using OpenFlow rules (including eth2 in the
> >diagram). The same with normal bridge, with tc/netfilter rules, the packets
> >could also be routed to other interface instead of bond0.
>
> True, and, at least in the openstack OVN/OVS deployments I'm
> familiar with, heavy use of openflow rules is the usual configuration.
> Those deployments also make use of tc rules for various purposes.
>
> >So the rt interface checking in bond_arp_send_all() is not always correct.
> >Stanislas suggested adding a new parameter like 'arp monitor source interface'
> >to binding that the user could supply. Then we can do like
> > If (rt->dst.dev == arp_src_iface->dev)
> > goto found;
> >
> >What do you think?
>
> A single "arp_src_iface" parameter won't scale if there are
> multiple ARP targets, as each target might need a different
> "arp_src_iface."
>
> Also, the original purpose of bond_verify_device_path() is to
> return VLAN tags in the device stack so that the ARP will be properly
> tagged.
>
> I think what you're really asking for is a "I know what I'm
> doing" option to bypass the checks in bond_arp_send_all(). That would
> also skip the VLAN tag search, so it's not necessarily a perfect
> solution.
I agree this is a better approach than "arp_src_iface" and that it's
still not perfect. For OVS bridges, VLAN information is in userspace
so we don't have a good way of retrieving it.
Also, this flag would apply to all ARP targets although I cannot think
of any topology that would require monitoring addresses on OVS and non
OVS interfaces.
Another possible approach would be to internally encode what interfaces
types do honor the "stacking is datapath" assumption. I also dislike
this given the flexibility netfilter and ebpf (and OpenFlow for that
matter) have to create virtual datapaths independent from interface
stacking, even on bridges.
Thanks.
Adrián
>
> Before considering such a change, I'd like to know why OVS + BFD
> over a kernel bond attached to the OVS bridge is unsuitable for your use
> case, as that's a common configuration I've seen with OVS.
>
> -J
>
> ---
> -Jay Vosburgh, jv@...sburgh.net
>
Powered by blists - more mailing lists