lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 18 Aug 2022 15:26:29 +0300
From:   Andrey Zhadchenko <andrey.zhadchenko@...tuozzo.com>
To:     Ilya Maximets <i.maximets@....org>, netdev@...r.kernel.org
Cc:     dev@...nvswitch.org, brauner@...nel.org, edumazet@...gle.com,
        avagin@...gle.com, alexander.mikhalitsyn@...tuozzo.com,
        kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net,
        ptikhomirov@...tuozzo.com, Aaron Conole <aconole@...hat.com>
Subject: Re: [ovs-dev] [PATCH net-next 0/1] openvswitch: allow specifying
 ifindex of new interfaces



On 8/18/22 14:06, Ilya Maximets wrote:
> On 8/18/22 01:11, Andrey Zhadchenko wrote:
>>
>>
>> On 8/18/22 01:15, Ilya Maximets wrote:
>>> On 8/17/22 22:35, Andrey Zhadchenko wrote:
>>>>
>>>>
>>>> On 8/17/22 21:19, Ilya Maximets wrote:
>>>>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>>>>> Hi!
>>>>>>
>>>>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>>>>> there was several requests for it. For example,
>>>>>> https://github.com/lxc/lxc/issues/2909
>>>>>>
>>>>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>>>>> preserve them after restore. Current openvswitch API does not allow to
>>>>>> specify ifindex. Most of the time we can just create an interface via
>>>>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>>>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>>>>> do not support selecting ifindex.
>>>>>
>>>>> Hmm.  Assuming you restored network interfaces by re-creating them
>>>>> on a target system, but I'm curious how do you restore the upcall PID?
>>>>> Are you somehow re-creating the netlink socket with the same PID?
>>>>> If that will not be done, no traffic will be able to flow through OVS
>>>>> anyway until you remove/re-add the port in userspace or re-start OVS.
>>>>> Or am I missing something?
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>
>>>> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
>>>> netlink protocol requests (see net/netlink/diag.c)  Upcall pid is exported
>>>> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
>>>> So everything is fine here.
>>>
>>> I didn't dig deep into how that works, but sounds interesting.
>>> Thanks for the pointers!
>>>
>>>>
>>>> I should note that I did not test *complicated* setups with ovs-vswitchd,
>>>> mostly basic ones like veth plugging and several containers in network. We
>>>> mainly supported Weave Net k8s SDN  which is based on ovs but do not use its
>>>> daemon.
>>>>
>>>> Maybe if this is merged and people start use this we will find more problems
>>>> with checkpoint/restore, but for now the only problem is volatile ifindex.
>>>
>>> Current implementation even with ifindexes sorted out will not work for
>>> at least one reason for recent versions of OVS.  Since last year OVS doesn't
>>> use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
>>> instead.  It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
>>> dispatch mode.  It is used by default starting with OVS 2.16. >
>>> So, you need to make sure you're correctly restoring 'user_features' and
>>> the OVS_DP_ATTR_PER_CPU_PIDS.  Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
>>> currently not dumped to userpsace via GET request, simply because ovs-vswitchd
>>> has no use for it.  So, you'll need to add that as well.
>> Thanks, this is very important! I will make v2 soon.
>>
>>>
>>> And there could be some issues when starting OVS from a checkpoint created
>>> on a system with different number of CPU cores.  Traffic will not be broken,
>>> but performance may be affected, and there might be some kernel warnings.
>> Migrating to another type of CPU is a challenge usually due to different CPUID
>> and some other problems (do we handle cpu cgroup values if ncpus changed? no
>> idea honestly). Anyway thanks for pointing that out.
> 
> TBH, it is a challenge to just figure out CPU IDs on a system you're running at,
> migration to a different CPU topology will be indeed a major pain if you want to
> preserve performance characteristics on a new setup.  It's probably much easier
> to re-start ovs-vswitchd after you migrated the kernel bits.  Though it doesn't
> seem like something that CRIU would like to do and I understand that.  Current
> ovs-vswitchd will not react to sudden changes in CPU topology/affinity.  Reacting
> to changes in CPU affinity though is something we're exploring, because it can
> happen in k8s-like environments with different performance monitoring/tweaking
> tools involved.
> 
> Regarding more "complex" scenarios, I should also mention qdisc's that OVS creates
> and attaches to interfaces it manages.  These could be for the purpose of QoS,
> ingress policing or offloading to TC flower.  OVS may be confused if these
> qdisc's will suddenly disappear.  This may cause some traffic to stop flowing
> or be directed to where it shouldn't be.  Don't know if CRIU covers that part. >
> There are also basic XDP programs that could be attached to interfaces along with
> registered umem blocks if users are running userspace datapath with AF_XDP ports.
> Is AF_XDP sockets or io_uring something that CRIU is able to migrate?  Just curious.

CRIU poorly handles traffic shaping. I assume all custom qdiscs are gone 
after migration. Probably we should refuse to checkpoint any non-default 
values to prevent unexpected results.

XDP (and eBPF in general) are not supported by CRIU. We can 
checkpoint/restore only simple classical BPF attached via SO_ATTACH_FILTER.

As far as I know io_uring support have some working PoC made in GSoC 
2021 but not yet merged.


> 
>>> If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
>>> recent versions of OVS, including 2.17 LTS, on more or less recent kernels.
>>>
>>> Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
>>> not critical, but would be nice to restore as well, if you're not doing
>>> that already.
>> Looks like ovs_dp_cmd_fill_info() already fills it so no need to additionally
>> patch kernel part. Current CRIU implementation does not care about it, but it
>> is not hard to include.
>>
>>>
>>>>
>>>> Best regards, Andrey Zhadchenko
>>>>>
>>>>>>
>>>>>> This patch allows to do so.
>>>>>> For new datapaths I decided to use dp_infindex in header as infindex
>>>>>> because it control ifindex for other requests too.
>>>>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>>>>
>>>>>> The only concern I have is that previously dp_ifindex was not used for
>>>>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>>>>> we have been running this patch at Virtuozzo for 2 years and have not
>>>>>> encountered this problem. Not sure if it is worth to add new
>>>>>> ovs_datapath_attr instead.
>>>>>>
>>>>>>
>>>>>> As a broader solution, another generic approach is possible: modify
>>>>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>>>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>>>>> sure that all its users are ready for ifindex change.
>>>>>> This will be indeed better for CRIU so we won't need to change every SDN
>>>>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>>>>> What do you think of this?
>>>>>>
>>>>>> Andrey Zhadchenko (1):
>>>>>>      openvswitch: allow specifying ifindex of new interfaces
>>>>>>
>>>>>>     include/uapi/linux/openvswitch.h     |  4 ++++
>>>>>>     net/openvswitch/datapath.c           | 16 ++++++++++++++--
>>>>>>     net/openvswitch/vport-internal_dev.c |  1 +
>>>>>>     net/openvswitch/vport.h              |  2 ++
>>>>>>     4 files changed, 21 insertions(+), 2 deletions(-)
>>>>>>
>>>>>
>>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ