[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f18e2734-ddf6-fbbc-d2ac-a2acb65fd073@ovn.org>
Date: Thu, 18 Aug 2022 13:06:52 +0200
From: Ilya Maximets <i.maximets@....org>
To: Andrey Zhadchenko <andrey.zhadchenko@...tuozzo.com>,
netdev@...r.kernel.org
Cc: i.maximets@....org, dev@...nvswitch.org, brauner@...nel.org,
edumazet@...gle.com, avagin@...gle.com,
alexander.mikhalitsyn@...tuozzo.com, kuba@...nel.org,
pabeni@...hat.com, davem@...emloft.net, ptikhomirov@...tuozzo.com,
Aaron Conole <aconole@...hat.com>
Subject: Re: [ovs-dev] [PATCH net-next 0/1] openvswitch: allow specifying
ifindex of new interfaces
On 8/18/22 01:11, Andrey Zhadchenko wrote:
>
>
> On 8/18/22 01:15, Ilya Maximets wrote:
>> On 8/17/22 22:35, Andrey Zhadchenko wrote:
>>>
>>>
>>> On 8/17/22 21:19, Ilya Maximets wrote:
>>>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>>>> Hi!
>>>>>
>>>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>>>> there was several requests for it. For example,
>>>>> https://github.com/lxc/lxc/issues/2909
>>>>>
>>>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>>>> preserve them after restore. Current openvswitch API does not allow to
>>>>> specify ifindex. Most of the time we can just create an interface via
>>>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>>>> do not support selecting ifindex.
>>>>
>>>> Hmm. Assuming you restored network interfaces by re-creating them
>>>> on a target system, but I'm curious how do you restore the upcall PID?
>>>> Are you somehow re-creating the netlink socket with the same PID?
>>>> If that will not be done, no traffic will be able to flow through OVS
>>>> anyway until you remove/re-add the port in userspace or re-start OVS.
>>>> Or am I missing something?
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
>>> netlink protocol requests (see net/netlink/diag.c) Upcall pid is exported
>>> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
>>> So everything is fine here.
>>
>> I didn't dig deep into how that works, but sounds interesting.
>> Thanks for the pointers!
>>
>>>
>>> I should note that I did not test *complicated* setups with ovs-vswitchd,
>>> mostly basic ones like veth plugging and several containers in network. We
>>> mainly supported Weave Net k8s SDN which is based on ovs but do not use its
>>> daemon.
>>>
>>> Maybe if this is merged and people start use this we will find more problems
>>> with checkpoint/restore, but for now the only problem is volatile ifindex.
>>
>> Current implementation even with ifindexes sorted out will not work for
>> at least one reason for recent versions of OVS. Since last year OVS doesn't
>> use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
>> instead. It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
>> dispatch mode. It is used by default starting with OVS 2.16. >
>> So, you need to make sure you're correctly restoring 'user_features' and
>> the OVS_DP_ATTR_PER_CPU_PIDS. Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
>> currently not dumped to userpsace via GET request, simply because ovs-vswitchd
>> has no use for it. So, you'll need to add that as well.
> Thanks, this is very important! I will make v2 soon.
>
>>
>> And there could be some issues when starting OVS from a checkpoint created
>> on a system with different number of CPU cores. Traffic will not be broken,
>> but performance may be affected, and there might be some kernel warnings.
> Migrating to another type of CPU is a challenge usually due to different CPUID
> and some other problems (do we handle cpu cgroup values if ncpus changed? no
> idea honestly). Anyway thanks for pointing that out.
TBH, it is a challenge to just figure out CPU IDs on a system you're running at,
migration to a different CPU topology will be indeed a major pain if you want to
preserve performance characteristics on a new setup. It's probably much easier
to re-start ovs-vswitchd after you migrated the kernel bits. Though it doesn't
seem like something that CRIU would like to do and I understand that. Current
ovs-vswitchd will not react to sudden changes in CPU topology/affinity. Reacting
to changes in CPU affinity though is something we're exploring, because it can
happen in k8s-like environments with different performance monitoring/tweaking
tools involved.
Regarding more "complex" scenarios, I should also mention qdisc's that OVS creates
and attaches to interfaces it manages. These could be for the purpose of QoS,
ingress policing or offloading to TC flower. OVS may be confused if these
qdisc's will suddenly disappear. This may cause some traffic to stop flowing
or be directed to where it shouldn't be. Don't know if CRIU covers that part.
There are also basic XDP programs that could be attached to interfaces along with
registered umem blocks if users are running userspace datapath with AF_XDP ports.
Is AF_XDP sockets or io_uring something that CRIU is able to migrate? Just curious.
>> If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
>> recent versions of OVS, including 2.17 LTS, on more or less recent kernels.
>>
>> Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
>> not critical, but would be nice to restore as well, if you're not doing
>> that already.
> Looks like ovs_dp_cmd_fill_info() already fills it so no need to additionally
> patch kernel part. Current CRIU implementation does not care about it, but it
> is not hard to include.
>
>>
>>>
>>> Best regards, Andrey Zhadchenko
>>>>
>>>>>
>>>>> This patch allows to do so.
>>>>> For new datapaths I decided to use dp_infindex in header as infindex
>>>>> because it control ifindex for other requests too.
>>>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>>>
>>>>> The only concern I have is that previously dp_ifindex was not used for
>>>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>>>> we have been running this patch at Virtuozzo for 2 years and have not
>>>>> encountered this problem. Not sure if it is worth to add new
>>>>> ovs_datapath_attr instead.
>>>>>
>>>>>
>>>>> As a broader solution, another generic approach is possible: modify
>>>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>>>> sure that all its users are ready for ifindex change.
>>>>> This will be indeed better for CRIU so we won't need to change every SDN
>>>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>>>> What do you think of this?
>>>>>
>>>>> Andrey Zhadchenko (1):
>>>>> openvswitch: allow specifying ifindex of new interfaces
>>>>>
>>>>> include/uapi/linux/openvswitch.h | 4 ++++
>>>>> net/openvswitch/datapath.c | 16 ++++++++++++++--
>>>>> net/openvswitch/vport-internal_dev.c | 1 +
>>>>> net/openvswitch/vport.h | 2 ++
>>>>> 4 files changed, 21 insertions(+), 2 deletions(-)
>>>>>
>>>>
>>
Powered by blists - more mailing lists