lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM4PR0501MB1940130EB22A118F3524DA54DBB10@AM4PR0501MB1940.eurprd05.prod.outlook.com>
Date:   Thu, 3 Aug 2017 15:48:03 +0000
From:   Ilan Tayari <ilant@...lanox.com>
To:     Florian Westphal <fw@...len.de>
CC:     Steffen Klassert <steffen.klassert@...unet.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Yevgeny Kliteynik <kliteyn@...lanox.com>,
        "Yossi Kuperman" <yossiku@...lanox.com>,
        Boris Pismenny <borisp@...lanox.com>,
        "Yossef Efraim" <yossefe@...lanox.com>
Subject: XFRM pcpu cache issue

Hi Florian,

I debugged a little the regression I told you about the other day...

Steps and Symptoms:
1. Set up a host-to-host IPSec tunnel (or transport, doesn't matter)
2. Ping over IPSec, or do something to populate the pcpu cache
3. Join a MC group, then leave MC group
4. Try to ping again using same CPU as before -> traffic doesn't egress the machine at all

If trying from another CPU (with clean cache), it pings well.
If clearing the pcpu cache, it works well again.

With a little more digging I found that when the cache is first populated (step 2), both xdst->u.dst.dev and xdst->u.dst.path->dev are the same device (my intended device).
At step 4, the cache has same xdst->u.dst.dev, but xdst->u.dst.path->dev points to 'lo' device.

With a HW breakpoint I found who changes it. It is this callstack:

#0  0xffffffff8158bc09 in dst_dev_put at net/core/dst.c:172
#1  0xffffffff815bff14 in rt_cache_route at net/ipv4/route.c:1367
#2  0xffffffff815c0005 in rt_set_nexthop at net/ipv4/route.c:1468
#3  0xffffffff815c25b9 in __mkroute_output at net/ipv4/route.c:2262
#4  ip_route_output_key_hash_rcu at net/ipv4/route.c:2454
#5  0xffffffff815c2b0e in ip_route_output_key_hash at net/ipv4/route.c:2289
#6  0xffffffff815f02e9 in __ip_route_output_key at ./include/net/route.h:125
#7  ip_route_connect at ./include/net/route.h:297
#8  __ip4_datagram_connect at net/ipv4/datagram.c:51
#9  0xffffffff815f048c in ip4_datagram_connect at net/ipv4/datagram.c:92
#10 0xffffffff815ff45e in inet_dgram_connect at net/ipv4/af_inet.c:540
#11 0xffffffff81563207 in SYSC_connect at net/socket.c:1628
#12 0xffffffff81564b8e in SyS_connect at net/socket.c:1609
#13 0xffffffff816aa5f7 in entry_SYSCALL_64_fastpath at arch/x86/entry/entry_64.S:203

The line there is very appropriate:
	dst->dev = dev_net(dst->dev)->loopback_dev;

So the dev is replaced when sending the first packet *after* the MC join/leave, and not during that flow.
For reference, in step 3 above, we do:
	socket(AF_INET,SOCK_DGRAM, IPPROTO_UDP)
	setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
	setsockopt(SOL_IP, IP_MULTICAST_TTL, 1)
	setsockopt(SOL_IP, IP_MULTICAST_LOOP, 1)	
	setsockopt(SOL_IP, IP_MULTICAST_IF, <ip of device>)
	setsockopt(SOL_IP, IP_ADD_MEMBERSHIP, <group>)
	setsockopt(SOL_IP, IP_MULTICAST_TTL, 1)
	bind(<group>, <some port>)
	And exit the process after a few seconds

I am using net-next from around two weeks ago.

I'll continue digging, but would love to hear your opinion and maybe suggestions on where to look next.

Ilan.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ