lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Wed, 19 Dec 2018 18:57:43 +0100
From:   Maksym Planeta <mplaneta@...inf.tu-dresden.de>
To:     netdev@...r.kernel.org
Subject: CRIU: Fail to restore on the same node

Hello,

I'm running a process tree inside a network and pid namespace and try to checkpoint it using CRIU (over RPC API), restore it on another node, checkpoint it again and restore the process tree on the original node. Unfortunately, the last operation fails if I restore with something around 5 minutes before the first checkpoint operation.

The problem seems to be connected to an attempt to restore the ip address of the loopback device inside the network namespace. Here is the relevant part of the log:

(00.032802) 1: Skip veth0/use_optimistic, coincides with default
(00.032805) 1: Skip veth0/use_tempaddr, coincides with default
(00.032835) 1: Try to restore a link 9:1:lo(00.032838) 1: Restoring link lo type 1
(15.034082) 1: Running ip addr restore
RTNETLINK answers: File exists
RTNETLINK answers: File exists
(15.037634) 1: Running ip route restore
RTNETLINK answers: File exists
(15.040505) 1: Running ip route restore
RTNETLINK answers: File exists
(15.043492) 1: Running ip rule flush
(15.046370) 1: Running ip rule delete table local
(15.048892) 1: Running ip rule restore
(15.051935) 1: Running iptables-restore -w for iptables-restore -w
(15.076275) 1: Running ip6tables-restore -w for ip6tables-restore -w
(15.104769) 1: Warn (criu/libnetlink.c:55): ERROR -16 reported by netlink
(15.123717) 1: Error (criu/util.c:1563): Can't wait or bad status: errno=0, status=65280
(15.123912) Error (criu/cr-restore.c:2300): Restoring FAILED.

Netlink returns -16 (-EBUSY), when CRIU tries to send "ifaddr-%u.img" file to ip addr restore.

Further, I figured out that -EBUSY returned by ctnetlink_change_status, when it compares status against constant IPS_ASSURED. At this point d == 4 == IPS_ASSURED, and status == 10 == IPS_CONFIRMED | IPS_SEEN_REPLY.

Here is the backtrace:

#0  0xffffffffc0d0a4f3 in ctnetlink_change_status (ct=0xffff880428e84000, cda=<optimized out>)
    at net/netfilter/nf_conntrack_netlink.c:1522
#1  0xffffffffc0d0f15c in ctnetlink_change_conntrack (cda=<optimized out>, ct=<optimized out>)
    at net/netfilter/nf_conntrack_netlink.c:1811
#2  ctnetlink_new_conntrack (net=0xffff8804282d9880, ctnl=<optimized out>, skb=<optimized out>, nlh=0x7fffffff,
    cda=0xffffc90002393a40, extack=<optimized out>) at net/netfilter/nf_conntrack_netlink.c:2092
#3  0xffffffffc0bfa4ed in nfnetlink_rcv_msg (skb=0xffff880426d2e600, nlh=0xffff880428f2c800,
    extack=0xffffc90002393b88) at net/netfilter/nfnetlink.c:228
#4  0xffffffff8163f052 in netlink_rcv_skb (skb=0xffff880428e84000, cb=0xa <irq_stack_union+10>)
    at net/netlink/af_netlink.c:2455
#5  0xffffffffc0bfadbf in nfnetlink_rcv (skb=0xffff880428e84000) at net/netfilter/nfnetlink.c:555
#6  0xffffffff8163e88f in netlink_unicast_kernel (ssk=<optimized out>, skb=<optimized out>, sk=<optimized out>)
    at net/netlink/af_netlink.c:1317
#7  netlink_unicast (ssk=0xffff880427e0f000, skb=0xffff880426d2e600, portid=0, nonblock=<optimized out>)
    at net/netlink/af_netlink.c:1343
#8  0xffffffff8163eb3b in netlink_sendmsg (sock=<optimized out>, msg=0xa <irq_stack_union+10>, len=<optimized out>)
    at net/netlink/af_netlink.c:1908
#9  0xffffffff815d3b7e in sock_sendmsg_nosec (msg=<optimized out>, sock=<optimized out>) at ./include/linux/uio.h:202
#10 sock_sendmsg (sock=0xffff880428a9c840, msg=0xffffc90002393ea0) at net/socket.c:652
#11 0xffffffff815d4115 in ___sys_sendmsg (sock=0xffff880428a9c840, msg=<optimized out>, msg_sys=0xffffc90002393ea0,
    flags=<optimized out>, used_address=0x0 <irq_stack_union>, allowed_msghdr_flags=<optimized out>)
    at net/socket.c:2126
#12 0xffffffff815d551c in __sys_sendmsg (fd=<optimized out>, msg=0xa <irq_stack_union+10>, flags=4,
    forbid_cmsg_compat=<optimized out>) at net/socket.c:2164
#13 0xffffffff815d557f in __do_sys_sendmsg (flags=<optimized out>, msg=<optimized out>, fd=<optimized out>)
    at net/socket.c:2173
#14 __se_sys_sendmsg (flags=<optimized out>, msg=<optimized out>, fd=<optimized out>) at net/socket.c:2171
#15 __x64_sys_sendmsg (regs=<optimized out>) at net/socket.c:2171
#16 0xffffffff810041d8 in do_syscall_64 (nr=<optimized out>, regs=0xa <irq_stack_union+10>)
    at arch/x86/entry/common.c:299
#17 0xffffffff81800088 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:238
#18 0x0000000000000000 in ?? ()

Originally, I posted this issue on CRIU github issue tracker (https://github.com/checkpoint-restore/criu/issues/581), but later I was advised to post it also, here, on netdev mailing list.

-- 
Regards,
Maksym Planeta

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ