[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <659b7215-88e6-df66-8d38-aab1eac4f531@virtuozzo.com>
Date: Thu, 18 Aug 2022 15:26:29 +0300
From: Andrey Zhadchenko <andrey.zhadchenko@...tuozzo.com>
To: Ilya Maximets <i.maximets@....org>, netdev@...r.kernel.org
Cc: dev@...nvswitch.org, brauner@...nel.org, edumazet@...gle.com,
avagin@...gle.com, alexander.mikhalitsyn@...tuozzo.com,
kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net,
ptikhomirov@...tuozzo.com, Aaron Conole <aconole@...hat.com>
Subject: Re: [ovs-dev] [PATCH net-next 0/1] openvswitch: allow specifying
ifindex of new interfaces
On 8/18/22 14:06, Ilya Maximets wrote:
> On 8/18/22 01:11, Andrey Zhadchenko wrote:
>>
>>
>> On 8/18/22 01:15, Ilya Maximets wrote:
>>> On 8/17/22 22:35, Andrey Zhadchenko wrote:
>>>>
>>>>
>>>> On 8/17/22 21:19, Ilya Maximets wrote:
>>>>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>>>>> Hi!
>>>>>>
>>>>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>>>>> there was several requests for it. For example,
>>>>>> https://github.com/lxc/lxc/issues/2909
>>>>>>
>>>>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>>>>> preserve them after restore. Current openvswitch API does not allow to
>>>>>> specify ifindex. Most of the time we can just create an interface via
>>>>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>>>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>>>>> do not support selecting ifindex.
>>>>>
>>>>> Hmm. Assuming you restored network interfaces by re-creating them
>>>>> on a target system, but I'm curious how do you restore the upcall PID?
>>>>> Are you somehow re-creating the netlink socket with the same PID?
>>>>> If that will not be done, no traffic will be able to flow through OVS
>>>>> anyway until you remove/re-add the port in userspace or re-start OVS.
>>>>> Or am I missing something?
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>
>>>> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
>>>> netlink protocol requests (see net/netlink/diag.c) Upcall pid is exported
>>>> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
>>>> So everything is fine here.
>>>
>>> I didn't dig deep into how that works, but sounds interesting.
>>> Thanks for the pointers!
>>>
>>>>
>>>> I should note that I did not test *complicated* setups with ovs-vswitchd,
>>>> mostly basic ones like veth plugging and several containers in network. We
>>>> mainly supported Weave Net k8s SDN which is based on ovs but do not use its
>>>> daemon.
>>>>
>>>> Maybe if this is merged and people start use this we will find more problems
>>>> with checkpoint/restore, but for now the only problem is volatile ifindex.
>>>
>>> Current implementation even with ifindexes sorted out will not work for
>>> at least one reason for recent versions of OVS. Since last year OVS doesn't
>>> use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
>>> instead. It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
>>> dispatch mode. It is used by default starting with OVS 2.16. >
>>> So, you need to make sure you're correctly restoring 'user_features' and
>>> the OVS_DP_ATTR_PER_CPU_PIDS. Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
>>> currently not dumped to userpsace via GET request, simply because ovs-vswitchd
>>> has no use for it. So, you'll need to add that as well.
>> Thanks, this is very important! I will make v2 soon.
>>
>>>
>>> And there could be some issues when starting OVS from a checkpoint created
>>> on a system with different number of CPU cores. Traffic will not be broken,
>>> but performance may be affected, and there might be some kernel warnings.
>> Migrating to another type of CPU is a challenge usually due to different CPUID
>> and some other problems (do we handle cpu cgroup values if ncpus changed? no
>> idea honestly). Anyway thanks for pointing that out.
>
> TBH, it is a challenge to just figure out CPU IDs on a system you're running at,
> migration to a different CPU topology will be indeed a major pain if you want to
> preserve performance characteristics on a new setup. It's probably much easier
> to re-start ovs-vswitchd after you migrated the kernel bits. Though it doesn't
> seem like something that CRIU would like to do and I understand that. Current
> ovs-vswitchd will not react to sudden changes in CPU topology/affinity. Reacting
> to changes in CPU affinity though is something we're exploring, because it can
> happen in k8s-like environments with different performance monitoring/tweaking
> tools involved.
>
> Regarding more "complex" scenarios, I should also mention qdisc's that OVS creates
> and attaches to interfaces it manages. These could be for the purpose of QoS,
> ingress policing or offloading to TC flower. OVS may be confused if these
> qdisc's will suddenly disappear. This may cause some traffic to stop flowing
> or be directed to where it shouldn't be. Don't know if CRIU covers that part. >
> There are also basic XDP programs that could be attached to interfaces along with
> registered umem blocks if users are running userspace datapath with AF_XDP ports.
> Is AF_XDP sockets or io_uring something that CRIU is able to migrate? Just curious.
CRIU poorly handles traffic shaping. I assume all custom qdiscs are gone
after migration. Probably we should refuse to checkpoint any non-default
values to prevent unexpected results.
XDP (and eBPF in general) are not supported by CRIU. We can
checkpoint/restore only simple classical BPF attached via SO_ATTACH_FILTER.
As far as I know io_uring support have some working PoC made in GSoC
2021 but not yet merged.
>
>>> If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
>>> recent versions of OVS, including 2.17 LTS, on more or less recent kernels.
>>>
>>> Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
>>> not critical, but would be nice to restore as well, if you're not doing
>>> that already.
>> Looks like ovs_dp_cmd_fill_info() already fills it so no need to additionally
>> patch kernel part. Current CRIU implementation does not care about it, but it
>> is not hard to include.
>>
>>>
>>>>
>>>> Best regards, Andrey Zhadchenko
>>>>>
>>>>>>
>>>>>> This patch allows to do so.
>>>>>> For new datapaths I decided to use dp_infindex in header as infindex
>>>>>> because it control ifindex for other requests too.
>>>>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>>>>
>>>>>> The only concern I have is that previously dp_ifindex was not used for
>>>>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>>>>> we have been running this patch at Virtuozzo for 2 years and have not
>>>>>> encountered this problem. Not sure if it is worth to add new
>>>>>> ovs_datapath_attr instead.
>>>>>>
>>>>>>
>>>>>> As a broader solution, another generic approach is possible: modify
>>>>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>>>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>>>>> sure that all its users are ready for ifindex change.
>>>>>> This will be indeed better for CRIU so we won't need to change every SDN
>>>>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>>>>> What do you think of this?
>>>>>>
>>>>>> Andrey Zhadchenko (1):
>>>>>> openvswitch: allow specifying ifindex of new interfaces
>>>>>>
>>>>>> include/uapi/linux/openvswitch.h | 4 ++++
>>>>>> net/openvswitch/datapath.c | 16 ++++++++++++++--
>>>>>> net/openvswitch/vport-internal_dev.c | 1 +
>>>>>> net/openvswitch/vport.h | 2 ++
>>>>>> 4 files changed, 21 insertions(+), 2 deletions(-)
>>>>>>
>>>>>
>>>
>
Powered by blists - more mailing lists