lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5658758A.3020602@univention.de>
Date:	Fri, 27 Nov 2015 16:23:54 +0100
From:	Philipp Hahn <hahn@...vention.de>
To:	linux-kernel@...r.kernel.org,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Tejun Heo <tj@...nel.org>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	"David S. Miller" <davem@...emloft.net>
Subject: Strange PF_NETLINK NETLINK_ROUTE stall: netlink: Fix autobind race
 condition that leads to zero port ID

Hello,

I observed a strange stall in a multi-threaded process (originally
OpenLDAP slapd), which stalls during a call to getaddrinfo() after
upgrading the kernel from 3.16.x to 4.1.12:

> #0  0x00007f8e6d97b87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #1  0x00007f8e6d99bfbc in make_request (fd=12, pid=2871, seen_ipv4=<optimized out>, seen_ipv6=<optimized out>, in6ai=<optimized out>, in6ailen=<optimized out>) at ../sysdeps/unix/sysv/linux/check_pf.c:119
> #2  0x00007f8e6d99c42a in __check_pf (seen_ipv4=0x7f8e6a0969ff, seen_ipv6=0x7f8e6a0969fe, in6ai=0x7f8e6a0969e0, in6ailen=0x7f8e6a0969d8) at ../sysdeps/unix/sysv/linux/check_pf.c:272
> #3  0x00007f8e6d959007 in *__GI_getaddrinfo (name=0x7f8e6a096a80 "0.0.0.0", service=0x400d04 "7636", hints=0x7f8e6a096a50, pai=0x7f8e6a096a48) at ../sysdeps/posix/getaddrinfo.c:2389

(The glibc tries to get the routing table to see via netlink, if it
should support IPv4 and/or IPv6. Neither
<https://sourceware.org/bugzilla/show_bug.cgi?id=12926> nor
<https://sourceware.org/git/?p=glibc.git;a=commit;h=636064eb4c03397c86aa26e489e68f952bd5e53f>
fixed the problem.)

I've been able to reproduce it by writing the attached program, which
calls getaddrinfo() from 10 thread 10^6 times. On my 8-core Intel I see
several threads hanging in the above call after some seconds - you don't
have to wait for the process to terminate):

> gdb --batch --ex 'set pagination off' --ex 'thread apply all bt' -p `pidof a.out` | grep rec
...
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82

Using "git bisect start v4.1.11 v.4.10" I've been able to identify
4e27762417669cb459971635be550eb7b5598286 (v4.1.9-26-g4e27762), which is
a back-port of 1f770c0a09da855a2b51af6d19de97fb955eca85
(v4.2-11228-g1f770c0). I've attached the log.

> # git log -1 4e27762417669cb459971635be550eb7b5598286
> commit 4e27762417669cb459971635be550eb7b5598286
> Author: Herbert Xu <herbert@...dor.apana.org.au>
> Date:   Fri Sep 18 19:16:50 2015 +0800
> 
>     netlink: Fix autobind race condition that leads to zero port ID
>     
>     [ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ]
>     
>     The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
>     Reset portid after netlink_insert failure") introduced a race
>     condition where if two threads try to autobind the same socket
>     one of them may end up with a zero port ID.  This led to kernel
>     deadlocks that were observed by multiple people.
>     
>     This patch reverts that commit and instead fixes it by introducing
>     a separte rhash_portid variable so that the real portid is only set
>     after the socket has been successfully hashed.
>     
>     Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
>     Reported-by: Tejun Heo <tj@...nel.org>
>     Reported-by: Linus Torvalds <torvalds@...ux-foundation.org>
>     Signed-off-by: Herbert Xu <herbert@...dor.apana.org.au>
>     Signed-off-by: David S. Miller <davem@...emloft.net>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>

I verified 4e27762417669cb459971635be550eb7b5598286~1 does not show the
problem.

v4.1.13 still has the problem.
> # git log --oneline  v4.1..v4.1.13 -- net/netlink/af_netlink.c
> e962218 netlink: Trim skb to alloc size to avoid MSG_TRUNC
> d486236 netlink: Replace rhash_portid with bound
> 4e27762 netlink: Fix autobind race condition that leads to zero port ID
> 65d48c6 netlink, mmap: transform mmap skb into full skb on taps
> d397617 netlink: make sure -EBUSY won't escape from netlink_insert
> b265c30 netlink: don't hold mutex in rcu callback when releasing mmapd ring

v4.3 does not show the problem.
> # git log --oneline  v4.2..v4.3 -- net/netlink/af_netlink.c
> 47191d6 netlink: fix locking around NETLINK_LIST_MEMBERSHIPS
> db65a3a netlink: Trim skb to alloc size to avoid MSG_TRUNC
> da314c9 netlink: Replace rhash_portid with bound
> 1f770c0 netlink: Fix autobind race condition that leads to zero port ID
> 1853c94 netlink, mmap: transform mmap skb into full skb on taps
> 6bb0fef netlink, mmap: fix edge-case leakages in nf queue zero-copy
> a66e365 netlink, mmap: don't walk rx ring on poll if receive queue non-empty
> 0ef7077 netlink: rx mmap: fix POLLIN condition
> 7084a31 netlink: mmap: fix lookup frame position
> 0a6a3a2 netlink: add NETLINK_CAP_ACK socket option

v4.2 does not show the problem (as expected, as the patch isn't in 4.2.0
> # git log --oneline  v4.2..v4.2.6 -- net/netlink/af_netlink.c
> 326ce2c netlink: Trim skb to alloc size to avoid MSG_TRUNC
> e2a3131 netlink: Replace rhash_portid with bound
> 6e32e73 netlink: Fix autobind race condition that leads to zero port ID
> 62f43b5 netlink, mmap: transform mmap skb into full skb on taps ring

I no longer have an idea where to look next, as something outside of
af_netlink.c seems to have fixed the issue for at least 4.3. Any idea?

Kernel: 4.1.12 (Debian)
GLIBC: 2.13-38+deb7u8
CPU: Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz

Thank you and a (hopefully) happy weekend.
Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
hahn@...vention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

View attachment "40059_glic-NL.c" of type "text/x-csrc" (2089 bytes)

View attachment "40059_linux-NL.bisect" of type "text/plain" (1832 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ