[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7365e0a6-a82b-c92a-137e-f28111a9c148@virtuozzo.com>
Date: Thu, 18 Aug 2022 02:11:35 +0300
From: Andrey Zhadchenko <andrey.zhadchenko@...tuozzo.com>
To: Ilya Maximets <i.maximets@....org>, netdev@...r.kernel.org
Cc: dev@...nvswitch.org, brauner@...nel.org, edumazet@...gle.com,
avagin@...gle.com, alexander.mikhalitsyn@...tuozzo.com,
kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net,
ptikhomirov@...tuozzo.com, Aaron Conole <aconole@...hat.com>
Subject: Re: [ovs-dev] [PATCH net-next 0/1] openvswitch: allow specifying
ifindex of new interfaces
On 8/18/22 01:15, Ilya Maximets wrote:
> On 8/17/22 22:35, Andrey Zhadchenko wrote:
>>
>>
>> On 8/17/22 21:19, Ilya Maximets wrote:
>>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>>> Hi!
>>>>
>>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>>> there was several requests for it. For example,
>>>> https://github.com/lxc/lxc/issues/2909
>>>>
>>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>>> preserve them after restore. Current openvswitch API does not allow to
>>>> specify ifindex. Most of the time we can just create an interface via
>>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>>> do not support selecting ifindex.
>>>
>>> Hmm. Assuming you restored network interfaces by re-creating them
>>> on a target system, but I'm curious how do you restore the upcall PID?
>>> Are you somehow re-creating the netlink socket with the same PID?
>>> If that will not be done, no traffic will be able to flow through OVS
>>> anyway until you remove/re-add the port in userspace or re-start OVS.
>>> Or am I missing something?
>>>
>>> Best regards, Ilya Maximets.
>>
>> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
>> netlink protocol requests (see net/netlink/diag.c) Upcall pid is exported
>> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
>> So everything is fine here.
>
> I didn't dig deep into how that works, but sounds interesting.
> Thanks for the pointers!
>
>>
>> I should note that I did not test *complicated* setups with ovs-vswitchd,
>> mostly basic ones like veth plugging and several containers in network. We
>> mainly supported Weave Net k8s SDN which is based on ovs but do not use its
>> daemon.
>>
>> Maybe if this is merged and people start use this we will find more problems
>> with checkpoint/restore, but for now the only problem is volatile ifindex.
>
> Current implementation even with ifindexes sorted out will not work for
> at least one reason for recent versions of OVS. Since last year OVS doesn't
> use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
> instead. It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
> dispatch mode. It is used by default starting with OVS 2.16. >
> So, you need to make sure you're correctly restoring 'user_features' and
> the OVS_DP_ATTR_PER_CPU_PIDS. Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
> currently not dumped to userpsace via GET request, simply because ovs-vswitchd
> has no use for it. So, you'll need to add that as well.
Thanks, this is very important! I will make v2 soon.
>
> And there could be some issues when starting OVS from a checkpoint created
> on a system with different number of CPU cores. Traffic will not be broken,
> but performance may be affected, and there might be some kernel warnings.
Migrating to another type of CPU is a challenge usually due to different
CPUID and some other problems (do we handle cpu cgroup values if ncpus
changed? no idea honestly). Anyway thanks for pointing that out.
>
> If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
> recent versions of OVS, including 2.17 LTS, on more or less recent kernels.
>
> Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
> not critical, but would be nice to restore as well, if you're not doing
> that already.
Looks like ovs_dp_cmd_fill_info() already fills it so no need to
additionally patch kernel part. Current CRIU implementation does not
care about it, but it is not hard to include.
>
>>
>> Best regards, Andrey Zhadchenko
>>>
>>>>
>>>> This patch allows to do so.
>>>> For new datapaths I decided to use dp_infindex in header as infindex
>>>> because it control ifindex for other requests too.
>>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>>
>>>> The only concern I have is that previously dp_ifindex was not used for
>>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>>> we have been running this patch at Virtuozzo for 2 years and have not
>>>> encountered this problem. Not sure if it is worth to add new
>>>> ovs_datapath_attr instead.
>>>>
>>>>
>>>> As a broader solution, another generic approach is possible: modify
>>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>>> sure that all its users are ready for ifindex change.
>>>> This will be indeed better for CRIU so we won't need to change every SDN
>>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>>> What do you think of this?
>>>>
>>>> Andrey Zhadchenko (1):
>>>> openvswitch: allow specifying ifindex of new interfaces
>>>>
>>>> include/uapi/linux/openvswitch.h | 4 ++++
>>>> net/openvswitch/datapath.c | 16 ++++++++++++++--
>>>> net/openvswitch/vport-internal_dev.c | 1 +
>>>> net/openvswitch/vport.h | 2 ++
>>>> 4 files changed, 21 insertions(+), 2 deletions(-)
>>>>
>>>
>
Powered by blists - more mailing lists