lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 Aug 2022 02:11:35 +0300
From:   Andrey Zhadchenko <andrey.zhadchenko@...tuozzo.com>
To:     Ilya Maximets <i.maximets@....org>, netdev@...r.kernel.org
Cc:     dev@...nvswitch.org, brauner@...nel.org, edumazet@...gle.com,
        avagin@...gle.com, alexander.mikhalitsyn@...tuozzo.com,
        kuba@...nel.org, pabeni@...hat.com, davem@...emloft.net,
        ptikhomirov@...tuozzo.com, Aaron Conole <aconole@...hat.com>
Subject: Re: [ovs-dev] [PATCH net-next 0/1] openvswitch: allow specifying
 ifindex of new interfaces



On 8/18/22 01:15, Ilya Maximets wrote:
> On 8/17/22 22:35, Andrey Zhadchenko wrote:
>>
>>
>> On 8/17/22 21:19, Ilya Maximets wrote:
>>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>>> Hi!
>>>>
>>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>>> there was several requests for it. For example,
>>>> https://github.com/lxc/lxc/issues/2909
>>>>
>>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>>> preserve them after restore. Current openvswitch API does not allow to
>>>> specify ifindex. Most of the time we can just create an interface via
>>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>>> do not support selecting ifindex.
>>>
>>> Hmm.  Assuming you restored network interfaces by re-creating them
>>> on a target system, but I'm curious how do you restore the upcall PID?
>>> Are you somehow re-creating the netlink socket with the same PID?
>>> If that will not be done, no traffic will be able to flow through OVS
>>> anyway until you remove/re-add the port in userspace or re-start OVS.
>>> Or am I missing something?
>>>
>>> Best regards, Ilya Maximets.
>>
>> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
>> netlink protocol requests (see net/netlink/diag.c)  Upcall pid is exported
>> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
>> So everything is fine here.
> 
> I didn't dig deep into how that works, but sounds interesting.
> Thanks for the pointers!
> 
>>
>> I should note that I did not test *complicated* setups with ovs-vswitchd,
>> mostly basic ones like veth plugging and several containers in network. We
>> mainly supported Weave Net k8s SDN  which is based on ovs but do not use its
>> daemon.
>>
>> Maybe if this is merged and people start use this we will find more problems
>> with checkpoint/restore, but for now the only problem is volatile ifindex.
> 
> Current implementation even with ifindexes sorted out will not work for
> at least one reason for recent versions of OVS.  Since last year OVS doesn't
> use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
> instead.  It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
> dispatch mode.  It is used by default starting with OVS 2.16. >
> So, you need to make sure you're correctly restoring 'user_features' and
> the OVS_DP_ATTR_PER_CPU_PIDS.  Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
> currently not dumped to userpsace via GET request, simply because ovs-vswitchd
> has no use for it.  So, you'll need to add that as well.
Thanks, this is very important! I will make v2 soon.

> 
> And there could be some issues when starting OVS from a checkpoint created
> on a system with different number of CPU cores.  Traffic will not be broken,
> but performance may be affected, and there might be some kernel warnings.
Migrating to another type of CPU is a challenge usually due to different 
CPUID and some other problems (do we handle cpu cgroup values if ncpus 
changed? no idea honestly). Anyway thanks for pointing that out.

> 
> If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
> recent versions of OVS, including 2.17 LTS, on more or less recent kernels.
> 
> Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
> not critical, but would be nice to restore as well, if you're not doing
> that already.
Looks like ovs_dp_cmd_fill_info() already fills it so no need to 
additionally patch kernel part. Current CRIU implementation does not 
care about it, but it is not hard to include.

> 
>>
>> Best regards, Andrey Zhadchenko
>>>
>>>>
>>>> This patch allows to do so.
>>>> For new datapaths I decided to use dp_infindex in header as infindex
>>>> because it control ifindex for other requests too.
>>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>>
>>>> The only concern I have is that previously dp_ifindex was not used for
>>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>>> we have been running this patch at Virtuozzo for 2 years and have not
>>>> encountered this problem. Not sure if it is worth to add new
>>>> ovs_datapath_attr instead.
>>>>
>>>>
>>>> As a broader solution, another generic approach is possible: modify
>>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>>> sure that all its users are ready for ifindex change.
>>>> This will be indeed better for CRIU so we won't need to change every SDN
>>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>>> What do you think of this?
>>>>
>>>> Andrey Zhadchenko (1):
>>>>     openvswitch: allow specifying ifindex of new interfaces
>>>>
>>>>    include/uapi/linux/openvswitch.h     |  4 ++++
>>>>    net/openvswitch/datapath.c           | 16 ++++++++++++++--
>>>>    net/openvswitch/vport-internal_dev.c |  1 +
>>>>    net/openvswitch/vport.h              |  2 ++
>>>>    4 files changed, 21 insertions(+), 2 deletions(-)
>>>>
>>>
> 

Powered by blists - more mailing lists